NP-EMD.2006.580.0001 Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files Kim Tomashosky, Ken Stone, Pat Purcell, Ron Andrews.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

Data Formats: Using self-describing data formats Curt Tilmes NASA Version 1.0 Review Date.
NASA DRL Support for S-NPP Direct Broadcast Users
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
File Management Systems
NP-EMD Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files Chuck Nellis NPOESS Program Aurora, Colorado.
Introduction to Databases Transparencies
Introduction to Database Management
Chapter 11 Structure. 2 Objectives You should be able to describe: Structures Arrays of Structures Structures as Function Arguments Dynamic Structure.
Comprehensive Large Array-data Stewardship System (CLASS) Web Site Tutorial Visit CLASS Site at
Suomi National Polar-orbiting Partnership (SNPP) Data Access NOAA Satellite Conference April 8-12, 2013 Kevin Berberich NESDIS/OSD NDE Project Photographs.
TCP/IP Protocol Suite 1 Chapter 6 Upon completion you will be able to: Delivery, Forwarding, and Routing of IP Packets Understand the different types of.
Delivery of Forecasted Atmospheric Ozone and Dust for a Public Health Decision-Support System-Architecture and Functionality William B. Hudspeth, Jeff.
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group June 30, NPOESS Data Formats Working Group.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
Systems analysis and design, 6th edition Dennis, wixom, and roth
DBS201: DBA/DBMS Lecture 13.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Aggregation – What’s it to The HDF Group? ESIP Summer Meeting 2013 Mike Folk & Larry Knox The HDF Group Aggregations, What's it to you?17/11/2013.
1 INTRODUCTION TO DATABASE MANAGEMENT SYSTEM L E C T U R E
HDF5 A new file format & software for high performance scientific data management.
MODIS Land and HDF-EOS HDF-EOS Workshop Presentation September 20, 2000 Robert Wolfe NASA GSFC Code 922, Raytheon ITSS MODIS Land Science Team Support.
NPP/ NPOESS Product Data Format Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPOAlgorithm / System EngineeringData / Information Architecture
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 Next Generation of Operational Earth Observations From the National Polar-Orbiting Operational Environmental Satellite System (NPOESS): Program Overview.
February 2-3, 2006SRB Workshop, San Diego P eter Cao, NCSA Mike Wan, SDSC Sponsored by NLADR, NFS PACI Project in Support of NCSA-SDSC Collaboration Object-level.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
Why do I want to know about HDF and HDF- EOS? Hierarchical Data Format for the Earth Observing System (HDF-EOS) is NASA's primary format for standard data.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
Support for NPP/NPOESS by The HDF Group Mike Folk The HDF Group HDF and HDF-EOS Workshop XII October 17, 2008 Oct HDF and HDF-EOS Workshop XII1.
5 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved.
High Data Volume Transfer Issues at NOAA Christopher D. Elvidge Earth Observation Group National Oceanic and Atmospheric Administration National Geophysical.
Draft GEO Framework, Chapter 6 “Architecture” Architecture Subgroup / Group on Earth Observations Presented by Ivan DeLoatch (US) Subgroup Co-Chair Earth.
Content Framework for Operational Environmental Remote Sensing Data Sets: NPOESS Concepts Alan M. Goldberg NOTICE This technical data.
Dimension Framework in AX 6 By, Nasheet Ahmed Siddiqui.
The HDF Group Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal, Peter Cao The HDF Group November 5, 2009 November 3-5,
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
1-1 Chapter 1 Databases and Database Users 1.1 Introduction 1.2 An Example 1.3 Characteristics of the Database Approach 1.4 Actors on the Scene 1.5 Workers.
DATABASE MANAGEMENT SYSTEM ARCHITECTURE
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.
Application of XTCE standard for the Scaleable Monitoring & Control System (SMACS) New generation of Java and XML based software components for spacecraft.
September 9, 2008SPEEDUP Workshop - HDF5 Tutorial1 Introduction to HDF5 Command-line Tools.
ATT Contribution to GEO Archive Task Team WGISS – 22 Sep 11 – 15, 2006 Annapolis, USA.
The HDF Group Introduction to netCDF-4 Elena Pourmal The HDF Group 110/17/2015.
INTRODUCTION TO GIS  Used to describe computer facilities which are used to handle data referenced to the spatial domain.  Has the ability to inter-
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
The HDF Group Overview of nagg Presentation and Demo for DEWG September 25, 2012 DEWG nagg tutorial1September 25, 2012 Larry Knox.
The HDF Group New Elements and Lessons Learned for New Mission HDF5 Products Ideas for new mission HDF5 data products 1July 8, 2013 Larry.
Standard Metadata in Scientific Data Formats September 19, 2007 Flash at:
Vision of an Integrated Global Observing System Gregory W. Withee Assistant Administrator for Satellite and Information Services National Oceanic and Atmospheric.
Global Change Master Directory (GCMD) Mission “To assist the scientific community in the discovery of Earth science data, related services, and ancillary.
AIRS/AMSU-A/HSB Data Subsetting and Visualization Services at GES DAAC Sunmi Cho, Jason Li, Donglian Sun, Jianchun Qin and Carrie Phelps, Code 902, NASA.
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal The HDF Group Annual HDF Briefing to ESDIS March 31, 2009 March Annual HDF Briefing.
CLASS Metadata and Remote Sensing Extensions CLASS Data Provider’s Conference September 2005 Anna Milan, Ted.Habermann,
From Missions to Measurements: an Ocean Discipline Experience.
NPP / NPOESS Product Profile of HDF5 Richard Ullman NASA / Goddard NPOESS Integrated Program Office.
U.S. Department of the Interior U.S. Geological Survey LP DAAC Big Earth Data Initiative (BEDI) Developed Web Services 1 Jason Werpy LP DAACEnterprise.
NASA Earth Science Data Stewardship
Databases and DBMSs Todd S. Bacastow January 2005.
Monitoring weather and climate from space
SNPP data access for agricultural monitoring
Introduction to HDF5 Session Five Reading & Writing Raw Data Values
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Data, Databases, and DBMSs
Measuring Data Quality and Compilation of Metadata
Profile of NPOESS HDF5 Files
Lecture 3: Main Memory.
Metadata The metadata contains
CMPE/SE 131 Software Engineering March 7 Class Meeting
Managing data Resources:
Presentation transcript:

NP-EMD Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files Kim Tomashosky, Ken Stone, Pat Purcell, Ron Andrews NPOESS Program Aurora, Colorado

NP-EMD Introduction Kim Tomashosky

NP-EMD About NPOESS The National Polar-orbiting Operational Environmental Satellite System * (NPOESS) is a satellite system used to monitor global environmental conditions, and collect and disseminate data related to: –Weather –Atmosphere –Oceans –Land –Near-space environment The National Polar-orbiting Operational Environmental Satellite System (NPOESS) will converge existing polar-orbiting satellite systems under a single national program Polar-orbiting satellites observe Earth from space –They collect and disseminate data on Earth's weather, atmosphere, oceans, land, and near- space environment –The polar orbiters are able to monitor the entire planet and provide data for long-range weather and climate forecasts *

NP-EMD About NPOESS, Continued Increases the timeliness and accuracy of severe weather event forecasts Will collect over 50 environmental measurements which are crucial to timely, accurate, weather forecasts by military and civilian organizations. It will enable: –Increased accuracy in severe storm warnings and forecasting –Improved drought analysis and flood warnings Managed by the tri-agency Integrated Program Office * (IPO) utilizing personnel from the Department of Commerce, Department of Defense, and NASA *

NP-EMD NPOESS Data Products NPOESS Data Products are distributed, formatted in HDF5 –Archived and made available to the community via the Comprehensive Large Array- data Stewardship System * (CLASS), an electronic library of NOAA environmental data –There is no “HDF-NPOESS” library, NPOESS Data Products have been designed using the native HDF5 library NPOESS Data Products –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) *

NP-EMD Data Organization Data Product Granules –A segment of data, with the size optimally determined to achieve maximum efficiency for an algorithm class. –It is associated with an integer number of sensor scans, and its definition varies for sensors and data products –Gaps in granules are filled using a pre-defined ‘missing data’ fill value –Represented as a set of region reference pointers to sections of the respective data set arrays Data Product Aggregations –A grouping of the same kind of granules packaged in HDF5 covering a temporal range –May contain as few as one granule and as many as an orbit of granules –Represented as a set of object reference pointers to the various groupings of data which make up a particular data product (one for each homogenous dataset included in the granule)

NP-EMD NPOESS Documentation Documentation for the NPOESS Data Products –NPOESS Common Data Format Control Book – External Volume I – Overview Volume II – RDR Formats Volume III – SDR/TDR Formats Volume IV – EDR/IP/ARP Formats Volume V – Metadata Volume VI – Ancillary Data, Auxiliary Data, Messages, and Reports Volume VII – Application Packets

NP-EMD NPOESS HDF5 General Overview Ron Andrews

NP-EMD HDF5 Conceptual Diagram

NP-EMD HDF5 XML User Block The XML User Block for NPOESS Data Products provides a ‘quick-look’ into the metadata of the associated HDF5 file –The size of the HDF5 XML User Block will be a multiple of 512 bytes The XML User Blocks are defined in the following volumes of the CDFCB-X: –Volume V – Metadata Contains the XML User Block formats for: –Raw Data Records (RDR) –Sensor Data Records (SDR) / Temperature Data Records (TDR) –Intermediate Products (IP) –Application Related Products (ARP) –Environmental Data Records (EDR) –Volume VI – Ancillary, Auxiliary, Reports, and Messages Contains the XML User Block formats for the Ancillary and Auxiliary data files that are delivered in HDF5 Example elements: –Mission, Platform, and Instrument Names –Number_of_Data_Products –CollectionShortName(s) –Aggregation Information –Timestamps

NP-EMD General HDF5 File Structure

NP-EMD NPOESS HDF5 Metadata Locations The NPOESS HDF5 Metadata is organized hierarchically, from the top down in order to reduce duplication of information and to take advantage of the hierarchical nature of HDF5 –Root Group Data Products Group –Data Product (indicated by the specific product’s identifier) »Product Aggregation Dataset »Product Granule Dataset

NP-EMD HDF5 Conceptual Diagram - Data

NP-EMD NPOESS Quality Flags Overview The concept is to provide for consistently stored, high density, quality information about the delivered data – simplifying usability while maintaining storage efficiency Quality flags are qualifications of one or more consecutive bits in each byte. Quality flag arrays follow the structure of the data product –The size of the arrays are equal to or less than the size of the data to which the quality information applies (dimensions correspond to the data product arrays) Quality flags are stored in the HDF5 files as n number(s) of two or three dimensional, 1-byte arrays. –The number of arrays is dependant on the quality flag definitions, specific to each data product –Each byte may contain multiple bit-level flags –Quality flags will be ordered such that each flag is entirely contained within a single byte, occasionally resulting in a byte with reserved or meaningless bits –Byte alignment is the same for every quality flag array First bit (left-most) is the LSB

NP-EMD Dimensional Array Example

NP-EMD Detailed NPOESS UML Models Ken Stone

NP-EMD RDR UML Model

NP-EMD Common RDR Layout

NP-EMD SDR/TDR UML Model

NP-EMD EDR UML Model

NP-EMD Geolocation UML Model

NP-EMD Ancillary/Auxiliary UML Models

NP-EMD NPOESS Sample Data Reading the NPOESS HDF5 file with the HDF API Patrick Purcell

NP-EMD VIIRS Ice Surface Temperature (IST) Environmental Data Record (EDR) Example UML Model

NP-EMD The NPOESS Granule - Product Profile Ice Surface Temperature The Product Profile describes the NPOESS granule. For Ice Surface Temperature, the fields in the granule are: –IST_Array (Shown below) –QF1_VIIRSISTEDR (Shown below) –QF2_VIIRSISTEDR –QF3_VIIRSISTEDR –ISTFactors (Scale & Offset – Shown below)

NP-EMD The NPOESS Granule - Product Profile IST Quality Flag Byte 1

NP-EMD The NPOESS Granule - Product Profile IST Scale Factors

NP-EMD VIIRS Ice Surface Temperature (IST) EDR – HDFView Screenshot

NP-EMD The NPOESS Granule – HDF View The granule dataset array “VIIRS-IST-EDR_Gran_1” contains object IDs that “point” or dereference to the second region of each dataset array under the “VIIRS-IST-EDR_All” group: The first object ID in the VIIRS-IST- EDR_Gran_1 array dereferences to the middle portion of the IST_Array All of these “portions” share the same time effectivity and other granule level metadata.

NP-EMD References to Regions

NP-EMD NPOESS Granules – Derefencing to Datasets Suggested Improvements to the HDF API Problem: When dereferencing to a portion of a dataset, currently there is no way to know the name of the dataset through the API –Solution: Add an API function that will return the name of the dataset referenced Note: This will be added to v1.8 beta Problem: When dereferencing to a portion of a dataset, a copy of the entire dataset’s dataspace is returned. The requested selection is populated with data, but other regions of the dataspace are filled –When the selection is a simple hyperslab, many users expect to retrieve only the hyperslab referenced, not a copy of the entire dataspace –This leads to confusion... a novice user of references will tend to size to the selection, not to the entire dataset’s dataspace –Solution: Add an option to the API to allow only the selection to be returned from the H5Rdereference command when choosing simple, contiguous hyperslab selections (as with NPOESS HDF5 granules) Any other suggestions from users?

NP-EMD NPOESS HDF5 Files Summary The NPOESS Program delivers the official deliverable data products (RDR, SDR/TDR, EDR/ARP/IP) and dynamic ancillary data and auxiliary data in HDF5 Files The HDF5 Files have an XML User Block that can be accessed without HDF5 tools - provides a “quick-look” into the metadata before opening the HDF5 file Metadata within the HDF5 files are stored as attributes There are general UML Models for the NPOESS official delivered data that provide a common framework Official deliverable data products are organized by reference objects (aggregations) which contain one or more reference regions (granules) Although data may be accessed directly through the All Data group, the Data Products group provides integrated access: –Allows the user to access both metadata and data through a common HDF5 group Metadata is accessed directly by reading the Attribute values Datasets may be accessed by dereferencing the object ID stored in the Data Products Group for the aggregation or granule NPOESS HDF5 files provide flexibility for a variety of end users.

NP-EMD Backup Slides

NP-EMD NPOESS Granules – Derefencing to Datasets Details (See the HDF5 User’s Guide release 1.6.5, Chapter 2, “The HDF5 Library and Programming Model” Section 2, “Dataspace Function Summaries” - H5S commands) Note that the H5S API commands fall into two broad categories: 1.Dataspace Management & Query Functions These functions operate on the entire dataspace –Entire dataspace is equivalent to an entire (temporal) aggregated array’s dataspace in an NPOESS HDF5 file under the “All_Data” group Example: H5Sget_simple_extent_npoints –Returns the number of elements in the entire Array under “All_Data” for HDF5 NPOESS. –For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 768 x 3200 = 2,457,600 points. 2.Dataspace Selection Functions – hyperslabs and points These functions operate on a hyperslab or a point selection For NPOESS HDF5 files, the “selection” is equivalent to the granule (hyperslab) for a particular field (array) The “selection” is the portion of the data array the reference “points” to: –Example: H5Sget_select_npoints »Determines the number of points in a dataspace selection. »For HDF5 NPOESS, this would be the number of points in a granule for a particular field »For VIIRS-IST-EDR_Gran_1, the first reference in the array (referencing the IST_Array) would return 256 x 3200 = 819,200 points. –Note that the “select” in the API command is short for “selection”. It is not a redundant term for “get”.

NP-EMD Extract from HDF5 User’s Guide (1.6.5), Section The Programming Model Reading and Writing a Portion of a Dataset A “selection” may be: –A hyperslab (NPOESS uses this only) –A Union of hyperslabs –A list of independent points. –Note: These illustrations show a mapping procedure to another dataspace. The HDF5 API does not do this when you dereference... this would be user defined.

NP-EMD h5dump Screenshot – VIIRS Sea Surface Temperature HDF5 File Another way to view the arrays of references (Aggregation and Granule dataset arrays) is with the h5dump utility: –Granule: –Aggregation: –Note: Currently, the only way to match the object ID in the granule/aggregation datasets is to manually list the aggregation as shown above using h5dump or look up the order in the NPOESS Data Format Control Book - External. The HDF Group will add the ability to obtain the name of the dataset a reference points to in v1.8 beta.

NP-EMD Needed Improvements in HDF H5R – Dereference API Suggestion: Allow the user to directly dereference and read only the hyperslab selection that the reference “points” to. The size of the returned dataset should be the size the of selection only, not the size of the entire dataspace for the object referenced. –Currently, the H5Rdereference call returns a handle to the dataset referenced and therefore, provides access to that dataset’s dataspace using H5Sget_ commands. Note that the reference can point to a very complex set of hyperslabs and/or individual points. The NPOESS selection is not complex... it is a simple hyperslab. Example: We request Granule 1 (second granule). The reference returns a handle to the entire dataset. Granule regions 0 and 2 contain fill data while the requested Gran_1 contains the data selection defined by the reference. The data must be read to a new array in order to obtain an array with just the desired Gran_1’s data and size.

NP-EMD Needed Improvements in HDF H5R – Dereference API (cont) Screenshot of output from the ISTFactors Array –A handle to the dataset is returned with the corresponding dataspace (in this example, size = 6) –The selected region contains the valid data from VIIRS-IST-EDR_Gran_0 ( and ). Other regions are (approximately) filled to zero.

NP-EMD Sample Code (p1)– Reads a Multi-Granule HDF5 NPOESS File

NP-EMD Sample Code (p2)

NP-EMD Sample Code (p3)

NP-EMD Sample Code (p4) – Code Output

NP-EMD Sample Files & HDF5 Reference API Summary NPOESS granules are made up of portions of one or more dataset arrays. In order to access a granule, the granule dataset must be read and each object ID dereferenced using the HDF Reference API (H5R) Use H5Sget_... commands to retrieve information about the entire dataspace of the array containing a reference’s selection (or hyperslab) Use H5Sget_select_... command to retrieve information about the selection only Suggested future enhancements to the HDF Reference API: –Add the ability to retrieve the name of the dataset containing a particular selection (to be added with v1.8 beta) –Add the ability to directly retrieve the hyperslab sized to the dataspace of the hyperslab only... not the size of the entire dataset referenced.