Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDF5 Tutorial @ICALEPCS2017 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group.

Similar presentations


Presentation on theme: "HDF5 Tutorial @ICALEPCS2017 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group."— Presentation transcript:

1 HDF5 Tutorial @ICALEPCS2017 October 8, 2017
Elena Pourmal Copyright 2016, The HDF Group.

2 Q1: How would you describe your knowledge of HDF5?

3 Q3: If you selected HDF5 features above, what would you like to learn more about?

4 Goals of today’s presentations
Help new users to start with HDF5 Answer your questions as we go through the material Help everyone to avoid major HDF5 pitfalls

5 The HDF Group and HDF5 October 8, 2017
Elena Pourmal Copyright 2016, The HDF Group.

6 Offices in Champaign, IL + Boulder, CO
Who is The HDF Group? 6 HDF Group has developed open source solutions for Big Data challenges for nearly 30 years Small company (~ 40 employees) with focus on High Performance Computing and Scientific Data Offices in Champaign, IL + Boulder, CO Our flagship platform – HDF5 – is at the heart of our open source ecosystem. Tens of thousands use HDF5 every day, as well as build their own solutions (800+ projects on Github) “De-facto standard for scientific computing” and integrated into every major analytics + visualization tool

7 What does the HDF Group do?
v HDF5 (Open Source) + “Enterprise” (Future) Connectors: ODBC + Cloud (Beta) Add-Ons: compression + VOL plugins + VFD plugins Products v Support Packages (Basic, Professional, Premier, Customized) Support for h5py + PyTables + pandas (NEW) Training Support v HDF5: new functionality + performance tuning for specific platforms General HPC software engineering with scientific expertise Consulting

8 Silicon Manufacturing Defense & National Security
Our Industries 8 v v v v v Financial Services Oil and Gas Aerospace Automotive Medical & Biotech v v v v v Silicon Manufacturing Electronics Instrument Government Defense & National Security Academic Research

9 Why Use HDF5? Self-documenting container optimized for scientific data
9 I/O library and tools optimized for scale and speed Self-documenting container optimized for scientific data Users who need both features

10 TRILLION-PARTICLE SIMULATION
Lawrence Berkeley National Laboratory (LBNL) 10 Complex collisions of particle that light up the aurora borealis can fracture Earth's magnetic shield and wreak havoc on electronics, power grids, and space satellites. Visualization of trillion-particle datasets made possible with HDF5 are helping scientists decipher how. Simulation ran at NERSC Cray XE6 on 120,000 cores using 80% of computing resources 90% of available memory 50% of Lustre scratch system and writing 10 one-trillion particle dumps of 30-42 TBs in HDF5 files; sustained ~ 27 GB/sec; total 350 TBs in HDF5

11 EARTH OBSERVING SYSTEM
11 EARTH OBSERVING SYSTEM NASA Deliver 6,700 Different Data Products to 12 Data Archive Centers Nearly 16 terabytes per day are redistributed to more than 1.7 million end users worldwide

12 When we say ‘HDF5’… HDF5 data model HDF5 library HDF5 “file” format
12 …we usually mean one of the following: A Data Model that organizes array variables in hierarchical structure A Library that maps/manages model instances in storage contexts (core, FS, net, obj. store) A self-describing “file” Format for serializing model instances into single or multi-file layouts The technology stack that includes A domain- specific format implemented on top of 4. (HDF5 as a Universal File Format) An Ecosystem (language bindings, 3rd party apps., standards) Open source HDF5 data model HDF5 library HDF5 “file” format

13 Why is this concept so different and useful?
Support for multidimensional data of complex types Data and metadata in one place streamlines data lifecycle and work flow Portable between different storage – FS, Object Store, fast memory and slow memory (backing store) Pluggable data transformation for compression, integrity, encryption, etc. High-performance I/O Large ecosystem (800+ Github projects, e.g., h5py, PyTables, Pandas)

14 What isn’t HDF5? Algorithm or Analytics Tool Shrink-Wrapped Service
14 Algorithm or Analytics Tool We provide the data, users provide their ‘”secret sauce” Shrink-Wrapped Service HDF5 is an SDK for developers to embed into their own solutions Fully-Featured Database HDF5 eliminates anything that slows down I/O performance

15 Questions?


Download ppt "HDF5 Tutorial @ICALEPCS2017 October 8, 2017 Elena Pourmal Copyright 2016, The HDF Group."

Similar presentations


Ads by Google