Download presentation
Presentation is loading. Please wait.
Published byMaximillian Park Modified over 9 years ago
1
Matthew Cechini Raytheon - EED ID: IN31C-07
2
ECHO Metadata Overview Introduction Problem Space Solutions ISO 19115 Lessons Learned – Perceived Issues – Gotchas – Kudos Conclusion
3
Earth Observing System (EOS) ClearingHOuse (ECHO) An integral component of metadata management within NASA’s Earth Observing System Data and Information System (EOSDIS) acting as the core metadata repository and providing a centralized mechanism for metadata and data discovery and retrieval. How metadata is used by ECHO Discovery Presentation/Documentation Interoperability Validation Metadata Format Landscape Existing catalog utilizes ECHO format (based upon ECS data model). Future science missions projected to provide ISO 19115 metadata.
4
Data discovery and retrieval tenets: 1. There exists a set of users who will require the entire metadata record for advanced analysis. 2. There exists a set of ‘core’ metadata fields recommended for data discovery. 3. There exists a set of users who will require a ‘core’ set of metadata fields for discovery only. 4. There will never be a cessation of new formats or a total retirement of all old formats. 5. Users should be presented metadata in a consistent format of their choosing.
5
ECHO’s metadata processing solution: 1. Identify a cross-format set of ‘core’ metadata fields for discovery. 2. Implement format-specific indexers to extract the ‘core’ metadata fields into an optimized query capability. 3. Archive the original metadata in its entirety for presentation to users requiring the full record. 4. Provide on-demand translation of ‘core’ metadata to any supported result format or standard. ECHO’s usage of ISO 19115/19139 1. Archive original metadata for documentation and advanced usage. 2. Extract ‘core’ metadata fields for data discovery. 3. Provide format translations from ISO to/from supported formats.
6
ECHO Metadata Overview Introduction Problem Space Solutions ISO 19115 Lessons Learned – Perceived Issues – Gotchas – Kudos Conclusion
7
MimeType The existing standard could be included, similar to how GML is incorporated, though maintained separately. MimeType values facilitate automated access where different file types resuls in different workflows (e.g. displaying native jpg images or extracting from hdf). File extensions are not always indicative. Type Code List values promote interoperability, but potentially reduce the ability for intra-community customization. A type attribute allows for more detailed identification for automated access (e.g. specific service protocols http://xml.opendap.org/ns/DAP/3.3# ) http://xml.opendap.org/ns/DAP/3.3#
8
Data Discovery How are links to discovery services made available (e.g. data casting feeds or search endpoints)? Endpoints may support multiple response formats, how would that be included? Data Processing Support for data processing links appears to be not supported. Both series and dataset level metadata may have URLs to services that expose subsetting, projection, and other services. Some service-specific information may be required and will need to be included in the metadata.
9
Representation Non-Standard Delimiters ▪ A self-defining hierarchy could be introduced within the keyword structure allowing for customized keyword lists. Automated Usage Optional Fields ▪ A flat representation of keyword structures that have optional levels may cause issues for automated keyword parsing. ▪ Translation into a metadata format where hierarchy is expected may not be possible. Earth Science > Oceans > Ocean Temperature > Sub-skin Sea Surface Temperature Earth Science | Oceans | Ocean Temperature | Sub-skin Sea Surface Temperature
10
Coordinate Systems Cartesian vs. Geodetic ▪ EX_GeographicBoundingBox does not specify a coordinate system. Two-D Coordinate Systems ▪ Unable to find where coordinate reference systems like WRS-2 and MODIS H/V tiling are a) defined and b) utilized. Orbit Metadata Series Level ▪ Unable to find where series level orbit metadata is represented (e.g. swath width, period, inclination angle, etc…). ▪ This information may be required for data discovery. Dataset Level ▪ Similar concern regarding placement of orbit metadata, again used for discovery (e.g. orbit number, crossing longitude, etc…)
11
Terminology Natural difficulties reconciliing terminology between communities. ▪ Dataset & Granules vs. Series & Dataset ▪ Archive Center vs Custodian Codelists are a double edged sword providing consistency but removing specificity and community vernacular. Citation Overload Contact information can be represented in numerous locations. Potentially stale contact information may be difficult to track down Combined Series & Dataset Metadata Good Idea… Combining series and dataset metadata during presentation. Bad Idea… Combining series and dataset metadata during archival.
12
Citations Thorough support for providing citations within the metadata. Metadata Lineage ISO lineage provides an excellent means to capture repeatable processing history information. Distribution Information Thorough support for online and offline access options including support for ordering.
13
ISO 19115 is on it’s way to becoming a viable metadata standard for metadata as a means of documentation. ISO 19115 is a bit verbose for the pragmatic requirements of data discovery (specifically dataset level). ISO 19115 lacks support for the growing presence of data processing services. All metadata standards are expected to have issues and will improve over time.
14
http://xkcd.com/927/ Matthew.F.Cechini@nasa.gov Moscone South: IN41B-1406 - Dec. 8, 8:00am-12:20pm
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.