Presentation Outline Metadata Coverage and Guidelines SMAP ISO Requirement Metadata Accessibility – HDF5 Group/Attribute Multiple Instantiation of the.

Slides:



Advertisements
Similar presentations
28 March 2003e-MapScholar: content management system The e-MapScholar Content Management System (CMS) David Medyckyj-Scott Project Director.
Advertisements

Configuration management
Configuration management
NASA Earth Science Data Preservation Content Specification H. K. (Rama) Ramapriyan John Moses 10 th ESDSWG Meeting – November 2, 2011 Newport News, VA.
Enhancing Data Quality of Distributive Trade Statistics Workshop for African countries on the Implementation of International Recommendations for Distributive.
Connecting HDF And ISO Metadata Ted Habermann, NASA/ESDIS Hook Hua, Barry Weiss, NASA/Jet Propulsion Lab Mike Folk, Gerd Heber, Elena Pourmal, The HDF.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
Creating Architectural Descriptions. Outline Standardizing architectural descriptions: The IEEE has published, “Recommended Practice for Architectural.
Mapping Physical Formats to Logical Models to Extract Data and Metadata Tara Talbott IPAW ‘06.
Database System Development Lifecycle Transparencies
Configuration Management
Introduction to Geospatial Metadata – ISO 191** Metadata National Coastal Data Development Center A division of the National Oceanographic Data Center.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
TIBCO Designer TIBCO BusinessWorks is a scalable, extensible, and easy to use integration platform that allows you to develop, deploy, and run integration.
Introduction to Geospatial Metadata – FGDC CSDGM National Coastal Data Development Center A division of the National Oceanographic Data Center Please .
What is Business Analysis Planning & Monitoring?
Chapter 9 Database Planning, Design, and Administration Sungchul Hong.
Database Planning, Design, and Administration Transparencies
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
The HDF Group July 8, 2014HDF 2014 ESIP Summer Meeting HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann The.
Data Formats: Using Self-describing Data Formats Curt Tilmes NASA Version 1.0 February 2013 Section: Local Data Management Copyright 2013 Curt Tilmes.
WP.5 - DDI-SDMX Integration E.S.S. cross-cutting project on Information Models and Standards Marco Pellegrino, Denis Grofils Eurostat METIS Work Session6-8.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Vers national spatial data infrastructure training program Geographic Metadata North American Profile Development for ISO Geographic Metadata.
Implementation of ISO Encoding Joint Nordic Implementation project Morten Borrebæk Norwegian Mapping Authority,
1ESDIS HDF-EOS Workshop IV Landover, Maryland, September 20, 2000 The Landsat 7 Processing System ( LPS ) Level Zero-R Science Products Michael R. Reid.
Confidential - Property of Navitas Accelerate define.xml using defineReady - Saravanan June 17, 2015.
DM_PPT_NP_v01 SESIP_0715_AJ HDF Product Designer Aleksandar Jelenak, H. Joe Lee, Ted Habermann Gerd Heber, John Readey, Joel Plutchak The HDF Group HDF.
MASSACHUSETTS INSTITUTE OF TECHNOLOGY NASA GODDARD SPACE FLIGHT CENTER ORBITAL SCIENCES CORPORATION NASA AMES RESEARCH CENTER SPACE TELESCOPE SCIENCE INSTITUTE.
Chapter 6 : Software Metrics
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
Introduction to MDA (Model Driven Architecture) CYT.
ATMOSPHERIC SCIENCE DATA CENTER ‘Best’ Practices for Aggregating Subset Results from Archived Datasets Walter E. Baskin 1, Jennifer Perez 2 (1) Science.
An Introduction to MINC John G. Sled. What is MINC? A medical image file format based on NetCDF A core set tools and libraries for image processing A.
Scalable Metadata Definition Frameworks Raymond Plante NCSA/NVO Toward an International Virtual Observatory How do we encourage a smooth evolution of metadata.
MapServer Support for Web Coverage Services Stephen Lime - Minnesota DNR Dr. Thomas E. Burk - University of Minnesota MUM Ottawa, Canada.
ZLOT Prototype Assessment John Carlo Bertot Associate Professor School of Information Studies Florida State University.
EARTH SCIENCE MARKUP LANGUAGE Why do you need it? How can it help you? INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
N P O E S S I N T E G R A T E D P R O G R A M O F F I C E NPP/ NPOESS Product Data Format Richard E. Ullman NOAA/NESDIS/IPO NASA/GSFC/NPP Algorithm Division.
The netCDF-4 data model and format Russ Rew, UCAR Unidata NetCDF Workshop 25 October 2012.
Database System Development Lifecycle 1.  Main components of the Infn System  What is Database System Development Life Cycle (DSDLC)  Phases of the.
® GRDC Hydrologic Metadata - core concepts - 5 th, WMO/OGC Hydrology DWG New York, CCNY, August 11 – 15, 2014 Irina Dornblut, GRDC of WMO at BfG Copyright.
Planetary Science Archive PSA User Group Meeting #1 PSA UG #1  July 2 - 3, 2013  ESAC PSA Archiving Standards.
Creating Archive Information Packages for Data Sets: Early Experiments with Digital Library Standards Ruth Duerr, NSIDC MiQun Yang, THG Azhar Sikander,
Documentation from NcML to ISO Ted Habermann, NOAA NESDIS NGDC.
Environment Change Information Request Change Definition has subtype of Business Case based upon ConceptPopulation Gives context for Statistical Program.
Adoption of RDA-DFT Terminology and Data Model to the Description and Structuring of Atmospheric Data Aaron Addison, Rudolf Husar, Cynthia Hudson-Vitale.
JPL/Caltech proprietary. Not for public release or redistribution. This document has been reviewed for export control and it does NOT contain controlled.
JPL/Caltech proprietary. Not for public release or redistribution. This document has been reviewed for export control and it does NOT contain controlled.
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.
NetCDF file generated from ASDC CERES SSF Subsetter ATMOSPHERIC SCIENCE DATA CENTER Conversion of Archived HDF Satellite Level 2 Swath Data Products to.
JPL/Caltech proprietary. Not for public release or redistribution. This document has been reviewed for export control and it does NOT contain controlled.
NPOESS Enhanced Description Tool - “ned” Richard E. Ullman NASA/GSFC/NPP NOAA/NESDIS/IPO Data / Information Architecture Algorithm / System Engineering.
1-2-3 February 2006 –Page 1 Mersea Integrated System How to improve Access/Downloading services ? How far do we go in terms of standardization ?
Why Standardize Metadata?. Why Have a Standard? Think for a moment how hard it would be to… … bake a cake without standard units of measurement. … put.
Page 1© Crown copyright 2004 FLUME Marco Christoforou, Rupert Ford, Steve Mullerworth, Graham Riley, Allyn Treshansky, et. al. 19 October 2007.
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Barry Weiss 1/4/ Jet Propulsion Laboratory, California Institute of Technology Quality Elements in ISO Metadata Design for Proposed SMAP Data.
00/XXXX 1 Data Processing in PRISM Introduction. COCO (CDMS Overloaded for CF Objects) What is it. Why is COCO written in Python. Implementation Data Operations.
The HDF Group New Elements and Lessons Learned for New Mission HDF5 Products Ideas for new mission HDF5 data products 1July 8, 2013 Larry.
EO Dataset Preservation Workflow Data Stewardship Interest Group WGISS-37 Meeting Cocoa Beach (Florida-US) - April 14-18, 2014.
ISO 191** Overview A “Family” of Standards. Resources ISO Standards Web Page – Technical.
Support for NPP/NPOESS by The HDF Group Mike Folk, Elena Pourmal The HDF Group Annual HDF Briefing to ESDIS March 31, 2009 March Annual HDF Briefing.
ECHO Technical Interchange Meeting 2013 Timothy Goff 1 Raytheon EED Program | ECHO Technical Interchange 2013.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
NASA Earth Science Data Stewardship
Unified Modeling Language
Accelerate define.xml using defineReady - Saravanan June 17, 2015.
Metadata in a Hydro-meteorological Model Chain
Metadata The metadata contains
Presentation transcript:

SMAP ISO Metadata in HDF5 Barry Weiss ESIP – Summer 2013 Chapel Hill, NC Barry Weiss Jet Propulsion Laboratory California Institute of Technology Pasadena, CA July 9, 2013

Presentation Outline Metadata Coverage and Guidelines SMAP ISO Requirement Metadata Accessibility – HDF5 Group/Attribute Multiple Instantiation of the Same Class Simplification of Structure Tool Chain Flow for Autogeneration Steps Going Forward © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

Metadata Coverage Product metadata – applies to the entire content of a data granule Mission specific information Spatial and time boundary information Data version information – algorithm, Science Processing Software (SPS), Science Data System (SDS) release, HDF5 version Granule lineage or pedigree Lists of the input that were used to generate a data granule Technical parameters that apply to the entire data granule Orbit mechanical data Instrument specific information Small tables of calibration and/or algorithmic coefficients Algorithmic parameters and options Data quality and completeness References to related documentation Local metadata – applies to particular arrays in the product. Maxima, minima, units, dimension definitions, identification of statistical methods © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

Metadata Guidelines The metadata shall provide users with adequate self descriptive information to enable an assessment of the content, the quality and the algorithmic conditions associated with any SMAP data product. The metadata shall enable users to locate specific and appropriate sets of data that they need for their investigation. The metadata shall enable users to correlate, interoperate and integrate SMAP data products with those generated by disparate sources, within and outside of NASA. © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

SMAP Requirement for Product Metadata SMAP Level 1 Requirement: SMAP Science Data Product formats shall conform to ISO 19115 “Geographic Information – Metadata”. ISO metadata must conform to these standards: Provide metadata that conforms to the family of ISO 19115 models Metadata represented using ISO 19139 compliant serialization Ultimate ISO goal – a global standard model in a global standard format Major Goal: Generate SMAP products conform to the ISO requirement, while at the same time: Ensure that the products that are easy to use Ensure that the products have consistent design Provide metadata that are easy to locate © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

ISO 19139 Serialization SMAP divides the ISO serialized metadata into two discrete packages: Dataset metadata XML is auto-generated with each executable instance SMAP software inserts auto-generated metadata into a single attribute in the HDF5 /Metadata Group named “iso_19139_dataset_xml SMAP SDS delivers auto-generated metadata along with each in a separate file to the Data Center Series metadata XML is curated Update the XML with each delivery SMAP software inserts curated series metadata into a single attribute in the HDF5 /Metadata Group named “iso_19139_series_xml” SMAP SDS delivers curated series metadata to the Data Center in a separate file before the SDS begins product delivery with each new release 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Metadata Accessibility ISO structure and serialization are ideal for machine access The ISO 19115 model has a well defined structure The ISO 19139 serialization adheres to that structure Combined standards provide data and relationships among the data elements in a very clear and regular form ISO is not as accessible for product users The model instantiates the same class multiple times Attribute within the class indicates the specifics of each instantiation Attribute is not easily accessible The rich and complete ISO model is complex to the uninitiated Metadata includes algorithm parameters and run time parameters Locating specific metadata elements can be difficult SMAP chose to start simple All product level metadata appear in the /Metadata Group Metadata also appear in clearly named elements and structure 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

MI_Metadata Class 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

DQ_Quality Class 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

SMAP LI_Lineage 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

CI_Citation Class and Subclasses 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Multiple Instantiations – Lineage in SMAP Products SMAP products employ a large number of input data sets. The Level 1C Radar Product employs the following input data sets: SMAP Level 1A Radar Product Spacecraft Ephemeris Spacecraft Attitude Spacecraft Antenna Azimuth Spacecraft Clock to UTC Correlation Short Term Calibration Data Long Term Calibration Data Total Electron Content in the Ionosphere Digital Elevation Map Antenna Pattern Block Floating Point Quantization Decoder Each source requires an instantiation of the LI_Lineage/LE_Source class In Group/Attribute structure, these elements fall in the /Metadata/Lineage HDF5 Group Subgroups names reflect the input product described in the group 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Lineage Example Radar Level 1A metadata contain 11 instances of LE_Source Identifier that specifies the Lineage element for each instantiation is in: LE_Source/sourceCitation/CI_Citation/identifier/MD_Identifier/code 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Group/Attribute Metadata Structure Employ HDF5 groups and attributes to represent ISO metadata Multiple sub-groups under the HDF5 Metadata group Groups represent major ISO classes Attributes map directly to attributes in the ISO classes Reduces deeply nested layers within the HDF5 representation No more than four nested layers In some instances, the design employs modified names of HDF5 groups or attributes to ease user comprehension of the model. The HDF group/attribute structure provides a representation layer that is more user friendly 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

SMAP Rationale Both the ISO 19139 XML and the HDF Group/Attribute structure must reflect the ISO model. Exclusive use of the model layer would require the development of tools that enable users to find the metadata they seek The Group/Attribute structure is, in effect, a tool to ease access Over time, the continued use of ISO model will engender the development of tools and interfaces that ease direct access with ISO serialization 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

ISO Metadata Structure Example ISO 19115 Group/Attribute Model for Lineage in the SMAP L1C Radar Product /Metadata/Lineage/ L1A_Radar DOI = http://dx.doi.org/10.5067/smap/radar/data100 creationDate = 2015-05-30 description = Parsed and reformatted SMAP radar telemetry. The Level 1A Product contains both synthetic aperture radar data and real aperture radar data. The product also includes loopback data as well as health and status data. fileName = SMAP_L1A_Radar_00016_A_20150530T160100_R04001_001.h5 identifier = L1A_Radar version = R04001 Ephemeris creationDate = 2015-05-29 description = One or more data products that list the spacecraft trajectory over the same time period as the input Level 1A radar data. fileName = traj_SPK_1505291400_1512291400_1505311200_sci_OD0945_v01.bsp version = 01 AntennaAzimuth description = One or more data products that specify the azimuth angle of the antenna on the SMAP spacecraft over the same time period as the input Level 1A radar data. fileName = smap_ar_150530153500_150530172515_v01.bc Attitude ………. 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Model Complexity – Locating Algorithm Parameters 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Locating Algorithm Parameters ISO 19115 Group/Attribute Model for Process Step in the SMAP L1C Radar Product Process Step RFI_Threshold = 2.0 FaradayRotationThreshold = 1.4 degrees waterBodyThreshold = 30 percent timeVariableEpoch = J2000 epochJulianDate = 2451545.00 epochUTCDateTime = 2000-01-01T11:58:55.816Z parameterVersionID = 004 algorithmTitle = Soil Moisture Active Passive Synthetic Aperture Radar processing algorithm algorithmVersionID = 007 algorithmDate = 2015-05-31 ………. Provides Direct Access to Critical Metadata Elements within the HDF5 Structure Items in Red are Additional Attributes. Represented in XML as Record/Record Types 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Major Groups in HDF5 Group/Attribute Structure The following are HDF5 groups in the SMAP Group/Attribute Structure. Each maps to an instantiation of an ISO class: AcquisitionInformation DataQuality DatasetIdentification Extent GridSpatialRepresentation Lineage OrbitMeasuredLocation ProcessStep ProductSpecificationDocument QADatasetIdentification SeriesIdentification 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

SMAP Science Processing Software SMAP Tool Chain Flow Metadata Configuration File SMAP Specific XML SMAP Science Processing Software SMAP Product in HDF5 with Metadata in Group/Attribute Structure Output Configuration File SMAP Specific XML h5dump saxon XSL that maps transform form HDF5 XML to ISO 19139 XML Automated Metadata in ISO 19139 Compliant Serialization Complete Group/Attribute Structure in HDF5 XML SMAP Product in HDF5 with Metadata in Group/Attribute Structure and in ISO 19139 Compliant XML Curated Series Metadata in ISO 19139 Compliant Serialization merge 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Steps Going Forward ISO offers huge promise Common metadata model for all Earth Science Data Products Common metadata representation for all Earth Science Data Products ISO is in early stages of real implementation Experience will dictate best methods for user access Tools for ISO extraction are not commonly available SMAP employed a modified representation Provides access to metadata in HDF5 environment in an ISO-like model NASA/SMAP will collaborate with theh HDF Group, other teams that generate Science Data Software Effort to incorporate methods that extract metadata directly and seamlessly for science data users 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Backup 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Global Metadata – ISO 19115 “Geographic Information - Metadata” from the International Organization for Standardization Provides a standardized means to describe Earth data Provides a means to make products “self descriptive and independently understandable” Incorporates all of the major categories required for a complete set of global metadata for each product granule Incorporates all of the major categories required to generate a complete set of collection metadata. Enables fulfillment of the requirement “to correlate, interoperate and integrate SMAP data products with those generated by disparate sources”. Uses standardized XML serialization to ease portability to the wider user community. Standard specified in ISO 19139. © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

CF Convention – Local Metadata The Climate and Forecast (CF) is a highly descriptive metadata convention with a widespread science user community CF designed specifically designed to fit within attributes in netCDF files. CF is based upon the Cooperative Ocean/Atmospheric Data Service (COARDS) standard The CF convention includes: A standard to provide descriptive names for each variable in the product Standards for the specification of data units for each variable in the product UDUNITS provides a list of supported unit names Standards for fill values for each variable in the product Standards to express the range of data for each variable in the product Standards to express bit flag definitions and define flag values Standards to specify relationships between spatial and time coordinates for each variable in the product Indicates which particular spatial or temporal coordinates correspond with which dimension axes and indices of a data variable. Standards to specify statistical methods that were used to calculate each variable in the product Clarifies temporal or spatial intervals that were used to provide statistical results. © California Institute of Technology. Government Sponsorship Acknowledged 2013-07-09

Dataset Metadata Developed an XSLT that maps the HDF5 group/attribute metadata in each data product granule into a representation that complies with ISO 19139 XML encoding Near the completion of each executable run, the SMAP software: Dumps the group/attribute metadata into HDF5 XML. Executes the open source Saxon XSLT engine to convert HDF5 XML to ISO 19139 XML. Incorporates the ISO 19139 compliant dataset metadata into an HDF5 attribute in the output data product granule Incorporates the curated ISO 19139 series metadata into a separate HDF5 attribute The SMAP mission delivers the ISO dataset 19139 compliant metadata to the Data Centers in two forms Embedded in the data product metadata for the user community In a collocated file for Data Center ingestion The separate file does not travel with the product 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Curated Series Metadata Systems Engineers curate the series metadata for each data product Model is ISO 19115 compliant with a few SMAP extensions Encoding is ISO 19139 compliant One file represents a specific SMAP data product for each build The SMAP SDS delivers the curated series metadata to ESDIS with each build. This delivery enables ingestion of data products at the Data Centers SMAP software automatically incorporates the entire series metadata into a single HDF5 attribute in each data product granule 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged

Standard Representation of Additional Attributes <eos:additionalAttribute> <eos:EOS_AdditionalAttribute> <eos:reference> <eos:EOS_AdditionalAttributeDescription> <eos:type> <eos:EOS_AdditionalAttributeTypeCode codeList="http://earthdata.nasa.gov/metadata/resources/" codeListValue="processingInformation">processingInformation</eos:EOS_AdditionalAttributeTypeCode> </eos:type> <eos:identifier> <gmd:MD_Identifier> <gmd:code> <gco:CharacterString>uuid for epochJulianDate</gco:CharacterString> </gmd:code> <gmd:codeSpace> <gco:CharacterString>http://smap.jpl.nasa.gov</gco:CharacterString> </gmd:codeSpace> </gmd:MD_Identifier> </eos:identifier> <eos:name> <gco:CharacterString>epochJulianDate</gco:CharacterString> </eos:name> <eos:dataType> <eos:EOS_AdditionalAttributeDataTypeCode codeList= http://earthdata.nasa.gov/metadata/resources/Codelists.xml#EOS_AdditionalAttributeDataTypeCode codeListValue="FLOAT">FLOAT</eos:EOS_AdditionalAttributeDataTypeCode> </eos:dataType> </eos:EOS_AdditionalAttributeDescription> </eos:reference> <eos:value> <gco:CharacterString>2451545</gco:CharacterString> </eos:value> </eos:EOS_AdditionalAttribute> </eos:additionalAttribute> 2013-07-09 © California Institute of Technology. Government Sponsorship Acknowledged