Presentation is loading. Please wait.

Presentation is loading. Please wait.

September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5.

Similar presentations


Presentation on theme: "September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5."— Presentation transcript:

1 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5

2 Why new features? September 9, 20082SPEEDUP Workshop - HDF5 Tutorial

3 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial3 Why new features? HDF5 1.8.0 was released in February 2008 Major update of HDF5 1.6.* series (stable set of features and APIs since 1998) New features 200 new APIs Changes to file format Changes to APIs Backward compatible New releases in November 2008 HDF5 1.6.8 and 1.8.2 Minor bug fixes Support for new platforms and compilers

4 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial4 Information about the release http://www.hdfgroup.org/HDF5/doc/ Follow “New Features and Compatibility Issues” links

5 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial5 Why new features? Need to address some deficiencies in initial design Examples: Big overhead in file sizes Non-tunable metadata cache implementation Handling of free-space in a file

6 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial6 Why new features? Need to address new requirements Add support for New types of indexing (object creation order) Big volumes of variable-length data (DNA sequences) Simultaneous real-time streams (fast append to one -dimensional datasets) UTF-8 encoding for objects’ path names Accessing objects stored in another HDF5 files (external or user-defined links)

7 Outline Dataset and datatype improvements Group improvements Link revisions Shared object header messages Metadata cache improvements Error handling Backward/forward compatibility HDF5 and NetCDF-4 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial7

8 Dataset and Datatype Improvements September 9, 20088SPEEDUP Workshop - HDF5 Tutorial

9 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial9 Text-based data type descriptions Why: Simplify data type creation Make data type creation code more readable Facilitate debugging by printing the text description of a data type What: New routines to create an HDF5 data type through the text description of the data type and get a text description from the HDF5 data type

10 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial10 Text data type description Example /* Create the data type from DDL text description */ ( dtype = H5LTtext_to_dtype( "H5T_IEEE_F32BE\n”,H5LT_DDL); "H5T_IEEE_F32BE\n”,H5LT_DDL); /* Convert the data type back to text */ H5LTtype_to_text(dtype, NULL, H5LT_DLL, str_len); dt_str = (char*)calloc(str_len, sizeof(char)); H5LTdtype_to_text(dtype, dt_str, H5LT_DDL, &str_len);

11 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial11 Serialized datatypes and dataspaces Why: Allow datatype and dataspace info to be transmitted between processes Allow datatype/dataspace to be stored in non- HDF5 files What: A new set of routines to serialize/deserialize HDF5 datatypes and dataspaces.

12 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial12 Serialized datatypes and dataspaces Example /* Find the buffer length and encode a datatype into buffer */ status = H5Tencode(t_id, NULL, &cmpd_buf_size); cmpd_buf = (unsigned char*)calloc(1, cmpd_buf_size); H5Tencode(t_id, cmpd_buf, &cmpd_buf_size) /* Decode a binary description of a datatype and retune a datatype handle */ t_id = H5Tdecode(cmpd_buf);

13 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial13 Integer to float convert during I/O Why: HDF5 1.6 and earlier supported conversion within the same class (16-bit integer  32-bit integer, 64- bit float  32-bit float) Conversion needed to support NetCDF 4 programming model What: Integer to float conversion supported during I/O

14 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial14 Integer to float convert during I/O Example: conversion is transparent to application /* Create a dataset of 64-bit little-endian type */ dset_id = H5Dcreate(loc_id,“Mydata”, H5T_IEEE_F64LE,space_id, H5P_DEFAULT, H5P_DEFAULT, H5P_DEFAULT); /* Write integer data to “Mydata” */ status = H5Dwrite(dset_id, H5T_NATIVE_INT, …);

15 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial15 Revised conversion exception handling Why: Give apps greater control over exceptions (range errors, etc.) during datatype conversion Needed to support NetCDF 4 programming model What: Revised conversion exception handling

16 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial16 Revised conversion exception handling To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb(). Cases of exception: H5T_CONV_EXCEPT_RANGE_HI H5T_CONV_EXCEPT_RANGE_LOW H5T_CONV_EXCEPT_TRUNCATE H5T_CONV_EXCEPT_PRECISION H5T_CONV_EXCEPT_PINF H5T_CONV_EXCEPT_NINF H5T_CONV_EXCEPT_NAN Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED

17 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial17 Compression filter for n-bit data Why: Compact storage for user-defined datatypes What: When data stored on disk, padding bits chopped off and only significant bits stored Supports most datatypes Works with compound datatypes

18 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial18 N-bit compression example In memory, one value of N-Bit datatype is stored like this: | byte 3 | byte 2 | byte 1 | byte 0 | |????????|????SPPP|PPPPPPPP|PPPP????| S-sign bit P-significant bit ?-padding bit After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this: | 1st value | 2nd value | |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|... Opposite (decompress) when going from disk to memory Limited to integer and floating-point data

19 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial19 N-bit compression example Example /* Create a N-bit datatype */ dt_id = H5Tcopy(H5T_STD_I32LE); H5Tset_precision(dt_id, 16); H5Tset_offset(dt_id, 4); /* Create and write a dataset */ dcpl_id = H5Pcreate(H5P_DATASET_CREATE); H5Pset_chunk(dcpl_id, …); H5Pset_nbit(dcpl_id); dset_id = H5Dcreate(…,…,…,…,…,dcpl_id,…); H5Dwrite(dset_id,…,…,…,…,buf);

20 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial20 Offset+size storage filter Why: Use less storage when less precision needed What: Performs scale/offset operation on each value Truncates result to fewer bits before storing Currently supports integers and floats Precision may be lost

21 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial21 Example with floating-point type Data: {104.561, 99.459, 100.545, 105.644} Choose scaling factor: decimal precision to keep E.g. scale factor D = 2 1. Find minimum value (offset): 99.459 2. Subtract minimum value from each element Result: {5.102, 0, 1.086, 6.185} 3. Scale data by multiplying 10 D = 100 Result: {510.2, 0, 108.6, 618.5} 4. Round the data to integer Result: {510, 0, 109, 619} 5. Pack and store using min number of bits

22 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial22 Offset+size storage filter Example /* Use scale+offset filter on integer data; let library figure out the number of minimum bits necessary to story the data without loss of precision */ H5Pset_scaleoffset (dcrp_id,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT); H5Pset_chunk(dcrp_id,…,…); dset_id = H5Dcreate(…,…,…,…,…,dcpl_id, …); /* Use sclae+offset filter on floating-point data; compression may be lossy */ H5Pset_scaleoffset(dcrp_id,H5Z_SO_FLOAT_DSCALE,2 );

23 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial23 “NULL” Dataspace Why: Allow datasets with no elements to be described NetCDF 4 needed a “place holder” for attributes What: A dataset with no dimensions, no data

24 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial24 NULL dataspace Example /* Create a dataset with “NULL” dataspace*/ sp_id = H5Screate(H5S_NULL); dset_id = H5Dcreate(…,"SDS.h5”,…,sp_id,…,…,…); HDF5 "SDS.h5" { GROUP "/" { DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE NULL DATA { }

25 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial25 HDF5 file format revision

26 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial26 HDF5 file format revision Why: Address deficiencies of the original file format Address space overhead in an HDF5 file Enable new features What: New routine that instructs the HDF5 library to create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, e.g. array datatype) Will talk about the versioning later

27 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial27 HDF5 file format revision Example /* Use the latest version of a file format for each object created in a file */ fapl_id = H5Pcreate(H5P_FILE_ACCESS); H5Pset_latest_format(fapl_id, 1); fid = H5Fcreate(…,…,…,fapl_id); or fid = H5Fopen(…,…,fapl_id);

28 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial28 Group Revisions

29 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial29 Better large group storage Why: Faster, more scalable storage and access for large groups What: New format and method for storing groups with many links

30 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial30 Informal benchmark Create a file and a group in a file Create up to 10^6 groups with one dataset in each group Compare files sizes and performance of HDF5 1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7 Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups

31 Time to open and read a dataset September 9, 2008SPEEDUP Workshop - HDF5 Tutorial31

32 Time to close the file September 9, 2008SPEEDUP Workshop - HDF5 Tutorial32

33 File size September 9, 2008SPEEDUP Workshop - HDF5 Tutorial33

34 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial34 Access links by creation-time order Why: Allow iteration & lookup of group’s links (children) by creation order as well as by name order Support netCDF access model for netCDF 4 What: Option to access objects in group according to relative creation time

35 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial35 Access links by creation-time order Example /* Track and index creation order of the links */ H5Pset_link_creation_order(gcpl_id, (H5P_CRT_ORDER_TRACKED | H5P_CRT_ORDER_INDEXED)); /* Create a group */ gid = H5Gcreate(fid, GNAME, H5P_DEFAULT, gcpl_id, H5P_DEFAULT);

36 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial36 Example: h5dump --group=1 tordergr.h5 HDF5 "tordergr.h5" { GROUP "1" { GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a21" { } GROUP "a22" { } GROUP "b" { } GROUP "c" { }

37 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial37 Example: h5dump --sort_by=creation_order HDF5 "tordergr.h5" { GROUP "1" { GROUP "c" { } GROUP "b" { } GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a22" { } GROUP "a21" { }

38 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial38 “Compact groups” Why: Save space and access time for small groups If groups small, don’t need B-tree overhead What: Alternate storage for groups with few links Default storage when “latest format” is specified Library converts to “original” storage (B-tree based) using default or user-specified threshold

39 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial39 “Compact groups” Example File with 11,600 groups With original group structure, file size ~ 20 MB With compact groups, file size ~ 12 MB Total savings: 8 MB (40%) Average savings/group: ~700 bytes

40 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial40 Compact groups Example /* Change storage to “dense” if number of group members is bigger than 16 and go back to compact storage if number of group members is smaller than 12 */ H5Pset_link_phase_change(gcpl_id, 16, 12) /* Create a group */ g_id = H5Gcreate(…,…,…,gcpl_id,…);

41 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial41 Intermediate group creation Why: Simplify creation of a series of connected groups Avoid having to create each intermediate group separately, one by one What: Intermediate groups can be created when creating an object in a file, with one function call

42 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial42 Intermediate group creation Want to create “/A/B/C/dset1” “A” exists, but “B/C/dset1” do not / A / A B dset1 C One call creates groups “B” & “C”, then creates “dset1”

43 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial43 Intermediate group creation Example /* Create link creation property list */ lcrp_id = H5Pcreate(H5P_LINK_CREATE); /* Set flag for intermediate group creation Groups B and C will be created automatically */ H5Pset_create_intermediate_group(lcrp_id, TRUE); ds_id = H5Dcreate (file_id, "/A/B/C/dset1",…,…, lcrp_id,…,…,);

44 Link Revisions September 9, 200844SPEEDUP Workshop - HDF5 Tutorial

45 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial45 What are links? Links connect groups to their members “Hard” links point to a target by address “Soft” links store the path to a target root group Hard link dataset Soft link “/target dataset”

46 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial46 New: External Links Why: Access objects stored in other HDF5 files in a transparent way What: Store location of file and path within that file Can link across files

47 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial47 file2.h5 file1.h5 New: External Links root group “External_link” “file2.h5” “/A/B/C/D/E” root group group “target object” External link object “External_link” in file1.h5 points to the group /A/B/C/D/E in file2.h5

48 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial48 External links Example /* Create an external link */ H5Lcreate_external(TARGET_FILE, ”/A/B/C/D/E", source_file_id, ”External_link”, …,…); /* We will use external link to create a group in a target file */ gr_id = H5Gcreate(source_file_id,”External_link/F”,…,…,…,…); /* We can access group “External_link/F” in the source file and group “/A/B/C/D/E/F” in the target file */

49 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial49 New: User-defined Links Why: Allow applications to create their own kinds of links and link operations, such as Create “hard” external link that finds an object by address Create link that accesses a URL Keep track of how often a link accessed, or other behavior What: Applications can create new kinds of links by supplying custom callback functions Can do anything HDF5 hard, soft, or external links do

50 Traversing an HDF5 file September 9, 200850SPEEDUP Workshop - HDF5 Tutorial

51 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial51 Traversing HDF5 file Why: Allow applications to iterate through the objects in a group or visit recursively all objects under a group What: New APIs to traverse a group hierarchy New APIs to iterate through a group using different types of indices (name or creation order) H5Giterate is deprecated in favor of new functions

52 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial52 Traversing HDF5 file Example of some new APIs /* Check if object “A/B” exists in a root group */ H5Lexists(file_id, “A/B”, …); /* Iterate through group members of a root group using name as an index; this function doesn’t recursively follow links into subgroups */ H5Literate(file_id, H5_INDEX_NAME, H5_ITER_INC, &idx, iter_link_cb, &info); /* Visit all objects under the root group; this function recursively follow links into subgroups */ H5Lvisit(file_id, H5_INDEX_NAME, H5_ITER_INC, visit_link_cb, &info);

53 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial53 Traversing HDF5 file Things to remember Never use H5Ldelete in any HDF5 iterate or visit call back functions Always close parent object before deleting a child object

54 Shared Object Header Messages September 9, 200854SPEEDUP Workshop - HDF5 Tutorial

55 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial55 Shared object header messages Why: metadata duplicated many times, wasting space Example: You create a file with 10,000 datasets All use the same datatype and dataspace HDF5 needs to write this information 10,000 times! Dataset 1 data 1 datatype dataspace Dataset 2 data 2 datatype dataspace Dataset 3 data 3 datatype dataspace

56 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial56 Shared object header messages What: Enable messages to be shared automatically HDF5 shares duplicated messages on its own! Dataset 1 data 1 datatype dataspace Dataset 2 data 2

57 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial57 Shared Messages Happens automatically Works with datatypes, dataspaces, attributes, fill values, and filter pipelines Saves space if these objects are relatively large May be faster if HDF5 can cache shared messages Drawbacks Usually slower than non-shared messages Adds overhead to the file Index for storing shared datatypes 25 bytes per instance Older library versions can’t read files with shared messages

58 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial58 Two informal tests File with 24 datasets, all with same big datatype 26,000 bytes normally 17,000 bytes with shared messages enabled Saves 375 bytes per dataset But, make a bad decision: invoke shared messages but only create one dataset… 9,000 bytes normally 12,000 bytes with shared messages enabled Probably slower when reading and writing, too. Moral: shared messages can be a big help, but only in the right situation!

59 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial59 Error Handling

60 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial60 Extendible error-handling APIs Why: Enable app to integrate error reporting with HDF5 library error stack What: New error handling API H5Epush - push major and minor error ID on specified error stack H5Eprint – print specified stack H5Ewalk – walk through specified stack H5Eclear – clear specified stack H5Eset_auto – turn error printing on/off for specified stack H5Eget_auto – return settings for specified stack traversal

61 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial61 Error-handling programming model Create new class, major and minor error messages Register messages with the HDF5 library Manage errors Use default or create new error stack Push error Print error stack Close stack

62 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial62 Error-handling example #define ERR_CLS_NAME "Error Test" #define PROG_NAME "Error Program" #define PROG_VERS "1.0” …… #define ERR_MAJ_TEST_MSG "Error in test” #define ERR_MIN_MYFUNC_MSG "Error in my function” …… /* Initialize error information for application */ ERR_CLS = H5Eregister_class(ERR_CLS_NAME, PROG_NAME, PROG_VERS); ERR_MAJ_TEST = H5Ecreate_msg(ERR_CLS, H5E_MAJOR, ERR_MAJ_TEST_MSG); ERR_MIN_MYFUNC = H5Ecreate_msg(ERR_CLS, H5E_MINOR, ERR_MIN_MYFUNC_MSG); …….. /* Unregister major and minor error, and class handles when done */ H5Eunregister_class(ERR_CLS);

63 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial63 Error-handling example /* This function creates and write a dataset */ static herr_t my_function(hid_t fid) { ……. /* Force this function to fail and make it push error */ H5E_BEGIN_TRY { dataset = H5Dcreate1(FAKE_ID, DSET_NAME, H5T_STD_I32BE, space, H5P_DEFAULT); } H5E_END_TRY; if(dataset < 0) { H5Epush(H5E_DEFAULT, __FILE__, FUNC_my_function, __LINE__, ERR_CLS, ERR_MAJ_IO, ERR_MIN_CREATE, "H5Dcreate failed"); goto error; } /* end if */ ……

64 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial64 Error-handling example Error Test-DIAG: Error detected in Error Program (1.0) thread 0: #000: error_example.c line 160 in main(): Error stack test failed major: Error in test minor: Error in my function #001: error_example.c line 100 in my_function(): H5Dcreate failed major: Error in IO minor: Error in H5Dcreate HDF5-DIAG: Error detected in HDF5 (1.8.1) thread 0: #002: H5Ddeprec.c line 154 in H5Dcreate1(): not a location ID major: Invalid arguments to routine minor: Inappropriate type #003: H5Gloc.c line 241 in H5G_loc(): invalid object ID major: Invalid arguments to routine minor: Bad value

65 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial65 Metadata cache

66 66 HDF5 metadata ● Metadata – extra information about user’s data ● Two types of metadata: ● Structural metadata: stores information about user’s data ● When you create a group, you really create: ● Group header ● B-Tree (to index entries), and ● Local heap (to store entry names) September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

67 67 HDF5 metadata ● User defined metadata (for example, created via the H5A calls) ● Usually small – less than 1 KB ● Accessed frequently ● Small disk accesses still expensive September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

68 68 Overview of HDF5 metadata cache ScenarioWorking set size (subset of metadata in use) Number of metadata cache accesses Create datasets A,B,C,D 10^6 chunks under root group < 1MB<50K Initialize the chunks using a round robin (1 from A, 1 from B, 1 from C, 1 from D, repeat until done < 1MB~30M 10^6 random accesses across A,B,C and D ~120MB~4M 10^6 random accesses to A only ~40MB~4M September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

69 69 HDF5 metadata cache Challenges peculiar to metadata caching in HDF5 Varying metadata entry sizes Most entries are less than a few hundred bytes Entries may be of almost any size Encountered variations from few bytes to megabytes Varying working set sizes < 1MB for most applications most of the time ~ 8MB (astrophysics simulation code) Metadata cache competes with application in core Cache must be big enough to to hold working set Should never be significantly bigger lest is starve the user program for core September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

70 70 Metadata Cache in HDF5 1.6.3 and before Hash Table Metadata  Fast  No provision for collisions  Eviction on collision  For small hash table performance is bad since frequently accessed entries hash to the same location  Good performance requires big size of hash table  Inefficient use of core  Unsustainable as HDF5 file size and complexity increases September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

71 71 Metadata Cache in HDF5 1.6.4 and 1.6.5 Entries are stored in a hash table as before Collisions handled by chaining Maintain a LRU list to select candidates for eviction Maintain a running sum of the sizes of the entries Entries are evicted when a predefined limit on this sum is reached Size of the metadata cache is bounded Hard coded to 8MB Doesn’t work when working set size is bigger Larger variations on a working set sizes are anticipated Manual control over the cache size is needed!!!! September 9, 2008SPEEDUP Workshop - HDF5 Tutorial

72 72 Metadata Cache in HDF5 1.6.4 and 1.6.5 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial 9 2 4 1 8 7 6 5 3 Hash Table LRU list Metadata 9Metadata 1Metadata 3 Metadata 2 Metadata 8 Metadata 4 Metadata 5Metadata 7 Metadata 6

73 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial73 Metadata Cache improvements Why: Improve I/O performance and memory usage when accessing many objects What: New metadata cache APIs control cache size monitor actual cache size and current hit rate Under the hood: adaptive cache resizing Automatically detects the current working size Sets max cache size to the working set size

74 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial74 Metadata cache improvements Note: most applications do not need to worry about the cache See “Special topics” in the HDF5 User’s Guide for details And if you do see unusual memory growth or poor performance, please contact us. We want to help you.

75 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial75 Forward-backward compatibility

76 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial76 What do we promise to our users? Backward compatibility Newer version of the library will always read files created with an older version Forward compatibility Application written to work with an older version will compile, link and run as expected with a newer version Requires compilation flag For more information see “API Compatibility Macros in HDF5” http://www.hdfgroup.org/HDF5/doc/index.html

77 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial77 HDF5 1.8.0 file format changes File format changes Support new features Object creation order, UTF-8 encoding, external links Reduce file overhead New format for global and local heaps New groups storage – compact groups Shared object messages Enabled only by using specific API If no new features requested, created file should be read by older versions of the HDF5 library

78 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial78 Example How application can create a 1.6.* incompatible file? Latest format is used for storing compound datatypes fapl = H5Pcreate(H5P_FILE_ACCESS); H5Pset_latest_format(fapl, TRUE); file = H5Fcreate(filename, H5F_ACC_TRUNC, H5P_DEFAULT, fapl); tid = H5Tcreate(H5T_COMPOUND, sizeof(struct s1)); H5Tinsert(…); dset = H5Dcreate(file, “New compound”, tid,………); H5Dwrite(dset, …); ……

79 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial79 HDF5 1.8.0 API changes HDF5 uses API versioning H5Gcreate1( loc_id, “New/My old group”, 0 ) H5Gcreate2( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id) HDF5 uses macros to map versioned API to a generic one H5Gcreate ( loc_id, “New/My old group”, 0 ) H5Gcreate ( loc_id, “New/My new group”, lcpl_id, gcpl_id, gapl_id) Mapping is set up at library build time; can be overwritten by application if application is built with special compilation flags

80 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial80 HDF5 1.8.0 API changes Examples of the new APIs H5Acreate1, H5Aopen1 - deprecated H5Acreate2, H5Aopen2 H5Gcreate1, H5Gopen1 H5Gcreate2, H5Gopen2, H5G_link_hard H5Rget_obj_type1 H5Rget_obj_type2 New APIs have more parameters to set up creation and access properties for the objects Default values for new parameters H5P_DEFAULT will emulate old behavior H5Gcreate ( loc_id, “New/My old group”, 0 ) H5Gcreate ( loc_id, “New/My new group”, H5P_DEFAULT,H5P_DEFAULT, H5P_DEFAULT)

81 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial81 HDF5 Library configuration Configure flag (global settings) Public symbols are mapped to Are deprecated symbols available? (default) --with-default-api- version=v18 1.8 (e.g. H5Gcreate is mapped to H5Gcreate2, old H5Gcreate is H5Greate1) Yes --disable-deprecated- symbols 1.8 (e.g. H5Gcreate is mapped to H5Gcreate2, H5Gcreate1 is not available) No --with-default-api- version=v16 1.6 (e.g. H5Gcreate is mapped to H5Gcreate1, H5Gcreate2 is available) Yes

82 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial82 HDF5 application configuration Global level HDF5 APIs mapping can be done by application Assuming both deprecated and new symbols are available in the library: h5cc my_program.c Both H5Gcreate1, H5Gcreate2 and H5Gcreate may be used h5cc -DH5_NO_DEPRECATED_SYMBOLS my_program.c Only new symbols are available for application; H5Gcreate is mapped to H5Gcreate2; application may use both H5Gcreate2 and H5Gcreate; cannot use H5Gcreate1 h5cc -DH5_USE_16_API my_program.c H5Gcreate is mapped to H5Gcreate1; all three H5Gcreate1, H5Gcreate2 and H5Gcreate can be used

83 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial83 HDF5 application configuration Per-function level Version and mapping can be set per function Assuming both deprecated and new symbols are available in the library: h5cc -D H5Gcreate_vers=1 -D H5Acreate_vers=2 my_program.c Maps H5Gcreate to H5Gcreate1 Maps H5Acreate to H5Acreate2 both H5Gcreate1 and H5Gcreate2 may be used; the same for H5Acreate1 and H5Acreate2 h5cc -D H5Gcreate_vers=2 my_program.c Maps H5Gcreate to H5Gcreate2 Both H5Gcreate1 and H5Gcreate2 may be used

84 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial84 Example: --with-default-api-version=v18 hid_t file_id, group_id; /* identifiers */... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt); grp2_id = H5Gcreate1(file_id,"/B",0); … grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt);

85 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial85 Example: --with-default-api-version=v16 hid_t file_id, group_id; /* identifiers */... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, "/A",0); grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/C", H5P_DEAFULT, gcpt, gapt);

86 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial86 Example: --disable-deprecated-symbols hid_t file_id, group_id; /* identifiers */... /* Open “file.h5” */ file_id = H5Fopen(“file.h5”, H5F_ACC_RDWR, H5P_DEFAULT, H5P_DEFAULT); /* Create several groups in a file */ grp1_id = H5Gcreate (file_id, ”New/A", H5P_DEAFULT, gcpt, gapt); /* Compilation will fail */ grp2_id = H5Gcreate1(file_id,"/B",0); grp3_id = H5Gcreate2(file_id,”New/A", H5P_DEAFULT, gcpt, gapt);

87 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial87 HDF5 and NetCDF-4

88 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial88 NetCDF-3 and HDF5 NetCDF-3 HDF5 Development, maintenance, and funding UCAR Unidata NSF The HDF Group NASA, DOE, other AdvantagesPopular, simple data model, lots of tools, multiple implementations (Java); data may be recovered after system crash Flexible, works on 32- bit and 64-bit platforms, high-performance, efficient storage, rich collection of data types, extensible DisadvantagesSeparate implementation to support parallel I/O (Argonne) and 64-bit platforms, limited number of data types (e.g. no support for structures), different limitations on variables, modifications after creation are inefficient Complex, steep learning curve, easy to misuse; system crash may corrupt HDF5 file

89 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial89 Goals of NetCDF/HDF combination Create NetCDF-4, combining desirable characteristics of NetCDF-3 and HDF5, while taking advantage of their separate strengths Widespread use and simplicity of NetCDF-3 Generality and performance of HDF5 Make NetCDF more suitable for high- performance computing, large datasets Provide simple high-level application programming interface (API) for HDF5 Demonstrate benefits of combination in advanced Earth science modeling efforts

90 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial90 What is NetCDF-4? NASA-funded effort to improve Interoperability among scientific data representations Integration of observations and model outputs I/O for high-performance computing Extended NetCDF-3 data model for scientific data (better data organization, richer data types and support for Extended set of NetCDF-3 APIs for using the model A new format for NetCDF data based on HDF5

91 What is NetCDF-4? Developed at Unidata in collaboration with The HDF Group Supported by Unidata Released in June 2008 Based on HDF5 1.8.1 Available from http://www.unidata.ucar.edu/software/netcdf/netcdf-4 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial91

92 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial92 NetCDF-4 Architecture HDF5 Library netCDF-4Library netCDF-3 Interface netCDF-3 applications netCDF-3 applicationsnetCDF-4applicationsnetCDF-4applications HDF5 applications HDF5 applications netCDF files netCDF files netCDF-4 HDF5 files HDF5 files Supports access to NetCDF files and HDF5 files created through NetCDF-4 interface

93 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial93 NetCDF vs HDF5 terminology NetCDFHDF5 DatasetHDF5 file VariableDataset Coordinate variableDimension scale

94 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial94 Extended NetCDF model NetCDF-3 models multidimensional arrays of primitive types with Variables, Dimensions, and Attributes; only one unlimited dimension is allowed HDF5 models multidimensional arrays of complex structures with Datasets and Attributes; multiple unlimited dimensions are allowed. NetCDF-4 implements an extended data model with enhancements made possible with HDF5: Structure types: like C structures, except portable Multiple unlimited dimensions Groups Variable-length objects New primitive types: strings, unsigned types, opaque

95 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial95 DatasetDataset location: URL NetCDF-3 Data Model Variable name: String shape: Dimension[ ] type: DataType name: String shape: Dimension[ ] type: DataType Array read( ) DimensionDimension name: String length: int name: String length: int isUnlimited( ) Attribute name: String type: DataType value: 1 D Array name: String type: DataType value: 1 D Array DataTypeDataType char byte short int float double char byte short int float double open( )

96 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial96 HDF5 Data Model Variable name: String shape: Dimension[ ] type: DataType name: String shape: Dimension[ ] type: DataType Array read( ) GroupGroup name: String members: Variable[ ] name: String members: Variable[ ] Attribute name: String value: Variable name: String value: Variable HDF5 File location : multiple open( ) Structure StructureStructure name: String members: Variable[ ] name: String members: Variable[ ] DataTypeDataType byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double String BitField Enumeration DateTime Opaque Reference VariableLength byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double String BitField Enumeration DateTime Opaque Reference VariableLength

97 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial97 A Common Data Model Variable name: String shape: Dimension[ ] type: DataType name: String shape: Dimension[ ] type: DataType Array read( ) GroupGroup name: String members: Variable[ ] name: String members: Variable[ ] DatasetDataset location: URL open( ) Structure StructureStructure name: String members: Variable[ ] name: String members: Variable[ ] DataTypeDataType byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double char String Opaque byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double char String Opaque DimensionDimension name: String length: int name: String length: int isUnlimited( ) isVariableLength( ) isUnlimited( ) isVariableLength( ) Attribute name: String type: DataType value: 1 D Array name: String type: DataType value: 1 D Array

98 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial98 NetCDF-4 Data Model Variable name: String shape: Dimension[ ] type: DataType name: String shape: Dimension[ ] type: DataType Array read( ) Group name: String members: Variable[ ] name: String members: Variable[ ] Attribute name: String type: DataType value: 1 D Array name: String type: DataType value: 1 D Array NetcDF-4 Dataset location: URL open( ) Structure name: String members: Variable[ ] name: String members: Variable[ ] DataTypeDataType byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double char String Opaque byte, unsigned byte short, unsigned short int, unsigned int long, unsigned long float double char String Opaque DimensionDimension name: String length: int name: String length: int isUnlimited( ) isVariableLength( ) isUnlimited( ) isVariableLength( )

99 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial99 Glance at NetCDF-4 Performance

100 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial100 Sequential NetCDF-3 and NetCDF-4

101 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial101 Preliminary performance study http://www.hdfgroup.uiuc.edu/papers/papers/ “NetCDF-4 performance report” June 2008 Used real NASA data for benchmarks Compares NetCDF-3 and NetCDF-4 performance for reading and writing data of different dimensionality using different storage layouts and system caching parameters NetCDF-3 and NetCDF-4

102 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial102 Summary For contiguous access patterns performance of NetCDF-4 is comparable with NetCDF-3 For non-contiguous access patterns chunking feature can improve performance (use the right size!) NetCDF-4 compression reduces data storage size and I/O time NetCDF-3 and NetCDF-4

103 Non-contiguous Access Logical layout for 2-dimensional arrays 16384 1 16 256 September 9, 2008103SPEEDUP Workshop - HDF5 Tutorial

104 Non-contiguous access Data layout in a file Chunk size [16384][1] Chunk size [8192][1] Chunk size [4096][1] 16384 non-adjacent data points … … September 9, 2008104SPEEDUP Workshop - HDF5 Tutorial

105 Non-contiguous write September 9, 2008105SPEEDUP Workshop - HDF5 Tutorial

106 Non-contiguous read September 9, 2008106SPEEDUP Workshop - HDF5 Tutorial

107 Non-contiguous access September 9, 2008107SPEEDUP Workshop - HDF5 Tutorial Word of caution Shown chunk size will cause poor performance if access pattern is by row or several contiguous rows If access pattern is not known, choose chunk size to accommodate “extreme” cases

108 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial108 Parallel NetCDF-4, PNetCDF and HDF5

109 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial109 Performance results for A parallel version of NetCDF-3 from ANL/Northwestern University (PnetCDF) HDF5 parallel library 1.6.5 NetCDF-4 beta1 For more details see materials under http://www.hdfgroup.uiuc.edu/papers/papers/ HDF5, NetCDF-4 and PNetCDF

110 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial110 Flash I/O Website http://flash.uchicago.edu/~zingale/flash_benchmark_io/http://flash.uchicago.edu/~zingale/flash_benchmark_io/ Robb Ross, etc.”Parallel NetCDF: A Scientific High-Performance I/O Interface HDF5 and PnetCDF Performance Comparison

111 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial111 HDF5 and PnetCDF performance comparison Bluesky: Power 4uP: Power 5

112 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial112 HDF5 and PnetCDF performance comparison Bluesky: Power 4uP: Power 5

113 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial113 Parallel NetCDF-4 and PnetCDF Fixed problem size = 995 MB Performance of NetCDF-4 is close to PnetCDF 0 20 40 60 80 100 120 140 160 0163248648096112128144 Number of processors Bandwidth (MB/S) PNetCDF from ANLNetCDF4

114 September 9, 2008SPEEDUP Workshop - HDF5 Tutorial114 Thank you! Questions?


Download ppt "September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 New Features in HDF5."

Similar presentations


Ads by Google