Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata.

Similar presentations


Presentation on theme: "National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata."— Presentation transcript:

1 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata Standards Platform Thomas Huang and Edward Armstrong PO.DAAC/JPL 2014 ESIP Summer Meeting, Copper Mountain, CO

2 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California NASA PO.DAAC The NASA Physical Oceanographic Distributed Active Archive Center (PO.DAAC) at Jet Propulsion Laboratory is an element of the Earth Observing System Data and Information System (EOSDIS). The EOSDIS provides science data to a wide communities of user for NASA’s Science Mission Directorate. Archives and distributes data relevant to the physical state of the ocean The mission of the PO.DAAC is to preserve NASA’s ocean and climate data and make these universally accessible and meaningful. thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 2 http://podaac.jpl.nasa.gov

3 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Our Users Need Our Help Discover/Identify the relevant data Deliver information (metadata) that our user communities can understand What to package? How to package? Retrieve the relevant data Use and Understand the data content thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 3 PO.DAAC User Communities

4 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE Architecture EDGE: Extensible Data Gateway Environment The brain behind PO.DAAC’s web portal and its Consolidated Web Service platform An architecture for metadata aggregation and translation thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 4

5 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California METADATA TRANSLATION ARCHITECTURE EDGE thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 5 Metadata Standard Templates for Domain- Specific Mappings

6 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California PO.DAAC Web Portal and Datacasting thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 6

7 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California OpenSearch thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 7 % curl -X GET \ ? "http://podaac.jpl.nasa.gov/ws/search/dataset/?format=rss&keyword=ocean" Terminal ESIP Discovery Specification RSS and Atom Dataset and Granule searches

8 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Metadata Service thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 8 % curl -X GET \ ? "http://podaac.jpl.nasa.gov/ws/metadata/dataset/?format=iso&shortName=OSDPD-L2P-MSG02" Terminal Common URL query to request dataset and granule metadata in various standards Formats supported iso – GHRSST GDS 2.0 gcmd (dataset only) – Global Climate Change Directory fgdc – Federal Geographic Data Committee Datacasting

9 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California ISO Metadata Model ISO 19115-2 metadata model for GHRSST GDS2 data sets – Utilizing MI_Metadata thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 9

10 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California METADATA TO ISO GHRSST thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 10 GHRSST Data Processing Specification version 2 on Metadata Conventions depicting the workflow of metadata translation for both data set and granule (file) level metadata to ISO 19115-2.

11 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Example ISO Export Script Example Python script to export dataset metadata in ISO format store them in individual XML files thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 11 #!/usr/bin/env python from xml.etree.ElementTree import parse import urllib url = 'http://podaac.jpl.nasa.gov/ws/search/dataset/?format=atom&' url += 'keyword=ocean' namespace = {"podaac": "http://podaac.jpl.nasa.gov/opensearch/", "opensearch": "http://a9.com/-/spec/opensearch/1.1/", "atom": "http://www.w3.org/2005/Atom"} startIndex=0 totalResults=1 while startIndex < totalResults: url = 'http://podaac.jpl.nasa.gov/ws/search/dataset/?format=atom&pretty=false&' url += 'keyword=amsr-e&startIndex=%d' % startIndex xml = parse(urllib.urlopen(url)) totalResults = int(xml.find('{%(opensearch)s}totalResults' % namespace).text) startIndex += int(xml.find('{%(opensearch)s}itemsPerPage' % namespace).text) items = xml.findall('{%(atom)s}entry' % namespace) for elem in items: datasetId = elem.find('{%(podaac)s}datasetId' % namespace).text if datasetId: link = elem.find("{%(atom)s}link[@title='ISO-19115 Metadata']" % namespace).attrib['href'] filename = "%s.iso.xml" % datasetId urllib.urlretrieve(link, filename)

12 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Some of Our ISO 19115-2 Challenges Challenges in implementing ISO metadata on how best to separate and combine metadata to describe collections vs. granules Development challenges – maintenance of our internal template when error discovered Need more work to describe quality information in granules and datasets Certain ISO metadata objects require the following Opening granule file to retrieve necessary information MD_SpatialRepresentation needs dimension size, resolution MD_ContentInformation, specifically, MI_CoverageDescription, needs physical measurement variables Input from provider and/or data engineers DQ_DataQuality needs identification of what the quality flags are per dataset Collection of external information MD_DistributionInfo, for example, needs information about remote distributors, e.g. URL, contact person thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 12

13 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Onward ISO metadata quality improvements Data description improvement for consistency Resolving missing attributes (ISO) Already in progress Use ISO metadata to describe quality information within a granule as to which variables contain quality flags and other filtering information and what those flags mean Have a tool read this information and expose it to the user EDGE Support ElasticSearch backend thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 13

14 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Summary There will always be Metadata standards or recommendations Different (may be better) ways to look for data Why PO.DAAC decides to invest in EDGE? No need to redo the plumbing for each new metadata standard Portable platform to integrate with local/external data services Allows us to focus on the domain – metadata standard and metadata resources thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 14

15 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California thuang, JPL 2014 ESIP Summer Meeting, Copper Mountain, CO 15 Thomas.Huang@jpl.nasa.gov Edward.M.Armstrong@jpl.nasa.gov THANKS


Download ppt "National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California EDGE: The Multi-Metadata."

Similar presentations


Ads by Google