Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL -

Similar presentations


Presentation on theme: "A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL -"— Presentation transcript:

1 A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - steve.hughes@jpl.nasa.gov Daniel J. Crichton/JPL - daniel.crichton@jpl.nasa.gov Jason J. Hyon/JPL - jason.hyon@jpl.nasa.gov Sean C. Kelly/UTA - sean.kelly@jpl.nasa.gov Open Forum on Metadata Registries January 17-21, 2000

2 A Multi-Discipline Metadata Registry for Science Interoperability Problem Statement System Overview Profile Development Conclusion and Outstanding Issues

3 Problem Statement Space scientists can not easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community. Heterogeneous Systems Data Management - RDBMS, ODBMS, HomeGrownDBMS, BinaryFiles Platforms - UNIX, LINUX, WIN3.x/9x/NT, Mac, VMS, … Interfaces - Web, Windows, Command Line Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII,... Data Volume - KiloBytes to TeraBytes Heterogeneous Disciplines Moving targets and stationary targets Multiple coordinate systems Multiple data object types (images, cubes, time series, spectrum, tables, binary, document) Multiple interpretations of single object types Multiple software solutions to same problem. Incompatible and/or missing metadata

4 Problem Statement Current Capabilities - Example Systems Astrobrowse - Web-based CGI-Script Distributed searches for products across hundreds of Astrophysics sites Search limited to target identifiers and RA/DEC specified areas Limited to stationary (identified) targets Not readily extensible to other coordinate systems or moving targets PDSBrowse - Web-based CGI-Scripts Searches for data sets and resources across eight distributed nodes Search allowed on any defined attribute Limited to identified targets Product searches relegated identified resources (catalogs)

5 Problem Statement Current Capabilities - Example Queries Current Planetary - PDSBrowse - Eight nodes Constrain on “image” and read description of the 15 resulting resources. Link to selected resources and perform search for image. Astrophysics - ASTROBrowse - Hundreds of nodes Submit query for target = Jupiter “Jupiter” might not be associated with all images containing Jupiter. Object name might have aliases Convert images to usable format. Requested OmniBrowse Find science_products where data_object_type = image and target = “JUPITER” Find science_products where data_object_type = image and target = all_aliases (“JUPITER”) and convert_to(JPEG)

6 Proposed Solution Encapsulate individual data systems. (Hide uniqueness.) Communicate using metadata. (Provide metadata with data.) Enable interoperability based on metadata compatibility. Refocus problem on metadata development.

7 Proposed Solution (cont) Domain independent data structures –XML - Standard interchange language –Metadata management –Message passing Domain independent system infrastructure –CORBA for interoperability between computer systems and languages –Message passing to simply interface design –Standardized reusable server components Resource descriptions (Metadata) –Resource profiles –Domain data dictionaries –Domain data models Object_Oriented Data Technology Task (OODT) –Domain independent data management infrastructure

8 System Overview Object Oriented Data Technology Framework SeaWinds StagingOODT ServerPDS StagingPTI Staging Profile Server Query Server Archive Server Product Server Archive Server Profile Server SybaseOracle Profile Server PDS Systems Product Server XML Scientist Web Server

9 System Overview Profile Service Describes a scientific data system via XML –Available datasets and metadata –Types of resources and where they’re located Optionally describes other profile servers Profile Server XML Data system 1 Data system 2 Profile Server XML Profile Server

10 System Overview Query Service Knows how to “crawl” through servers to produce a result –Crawls through profiles to discover other profiles and product servers –Crawls through product servers to display available products Accessible through CORBA API or through web browser

11 System Overview Object Oriented Data Technology Framework

12 Profile Development Objective Design and develop an architecture that will manage the meta-data necessary for identifying and locating science data resources across distributed heterogeneous data systems.

13 Profile Development Approach Choose a common interchange format. Develop a domain generic language. Implement domain specific instances. Model the domain. Capture the meta-data. Develop system to manage the results.

14 Profile Development Choose a common interchange format XML eXtensible Markup Language More expressive than HTML More simple than SGML A meta-language used to define domain languages. XSIL - eXtensible Scientific Interchange Language. XIL - Instrument control language. Wide acceptance as an interchange format. Electronic data interchange (EDI) standard. Space sciences data systems

15 Profile Development Develop a domain generic language Define a generic structure (XML DTD) that can describe heterogeneous domain-specific resources. Profile - A resource description with sufficient information to determine if the resource satisfies a query. Profile elements name, syntax, unit, instance, meaning, alias, … encodes all domain attributes and their values specific to this resource Resource attributes - id, title, discipline, location_id, … Profile attributes - id, title, desc, type, data_dictionary_id, …

16 Profile Development Develop a domain generic language prof.dtd <!ELEMENT PROFILES (PROFILE+)> <!ELEMENT PROFILE (PROFILE_ATTRIBUTES, RESOURCE)> <!ATTLIST PROFILE PROFILE_ID CDATA #REQUIRED > <!ELEMENT PROFILE_ATTRIBUTES (ID, TITLE*, DESC*, TYPE*, STATUS_ID*, SECURITY_TYPE*, PARENT_ID*, CHILD_ID*, REVISION_NOTE*, DATA_DICTIONARY_ID*)> <!ELEMENT RESOURCE (RESOURCE_ATTRIBUTES, PROFILE_ELEMENT*)> <!ELEMENT RESOURCE_ATTRIBUTES (RESOURCE_ID, RESOURCE_TITLE, RESOURCE_DISCIPLINE, RESOURCE_AGGREGATION, RESOURCE_CLASS, RESOURCE_LOCATION_ID, RESULT_MIME_TYPE)> <!ELEMENT PROFILE_ELEMENT (ELEMENT_NAME, ELEMENT_MEANING*, ELEMENT_ALIAS*, VALUE_SYNTAX*, VALUE_UNIT*, (VALUE_INSTANCE | (MINIMUM_VALUE, MAXIMUM_VALUE))*)>

17 Profile Development Develop a domain generic language Specialize the profile class Profile - One profile to one resource (e.g. catalog) Inventory - One profile to many resources (e.g. catalog entries) Minimized profile element Dictionary - One profile to one discipline Comprehensive profile elements aliases meanings

18 Profile Development Develop a domain generic language Profile element hierarchy (given a domain data dictionary) Dictionary keywords - all keywords in domain keyword values - union of all values in domain e.g. TARGET_NAME = {ADRASTEA, …, VENUS} Profile keywords - all keywords associated with a domain resource keyword values - union of all values in domain resource e.g. TARGET_NAME = {MARS, DEIMOS, PHOBOS} Inventory keywords - keywords associated with inventory item. Keyword values - values for inventory item. E.g. TARGET_NAME = {MARS}

19 Profile Development Implement domain specific instances Apply domain generic language to specific domain. E.g. Space/Earth Science data and other resources. Model the domain Planetary Science Data Dictionary Planetary Data System Data Model Entity-Relationship model Capture the meta-data Extracted from PDS metadata repository

20 Profile Development Implement domain specific instances Model the domain (from the start) Planetary Science Data Dictionary Over 1000 Data Elements spanning Planetary Science Nomenclature Standard Meaning, type, ranges, enumerated values Planetary Science Data Model Developed as Planetary Science enterprise E/R model Planetary Science Entities - Spacecraft, Instruments Science Data Entities - Data Products Data Organization Entities - Volumes Management Entities - Nodes, Personnel Implemented as Data Set Catalog in an RDBMS

21 Profile Development Planetary Science Standards Architecture Metadata and data standards All data products are labeled Data Model relates entities

22 Profile Development Implement domain specific instances Profile Example - PDS Distributed Inventory System PROFILE_PDS_DIS_V1.3.n Planetary Data System - Distributed Inventory System - Profile V1.0 This profile describes the Planetary Data System (PDS) Distributed Inventory System (DIS)... PROFILE OODT_PDS_DATA_SET_DD_V1.0 PDS_DIS_V1.3.n Planetary Data System - Distributed Inventory System PDS GRANULE+ INVENTORY http://pds.jpl.nasa.gov/pdsbrows.htm text/html...

23 Profile Development Implement domain specific instances Profile Example (cont) - PDS Distributed Inventory System … DATA_OBJECT_TYPE The data_object_type element provides the type... ENUMERATION N/A IMAGE DATA_SET_NAME The data_set_name element identifies a PDS data set. -- example... ENUMERATION N/A VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL... VO2 MARS RADIO SCIENCE SUBSYSTEM RESAMPLED LOS... TARGET_NAME The target_name element provides the names of the targets... ADS.OBJECT_ID ENUMERATION N/A IDA JUPITER

24 Profile Development Implement domain specific instances Inventory Example - PDS Data Set VO1/VO2-M-VIS-5-DIM-V1.0 VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL... PDS GRANULE+ DATA http://pds.jpl.nasa.gov/cgi-bin/pdsserv.pl?OBJECT_ID=PDS100676... text/html DATA_SET_NAME VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL... DATA_OBJECT_TYPE TABLE TARGET_NAME MARS VOLUME_ID VO_2001... VO_2014

25 Conclusion Profile Development - Review Choose a common interchange format. (XML) Develop a domain generic language. (X2PL) (XML eXtensible Profile Language) Implement domain specific instances. (Science Resource Profiles) Develop system to manage the results. (Profile Servers)

26 Conclusion Outstanding Issues Metadata development remains the problem Data Dictionary interoperability is required NASA Data Entity Dictionary Specification Language (DEDSL) semantics being used to promote compatibility. (Subset of 11179-3) Need XML specification Data Model interoperability is required

27 Backup Slides

28 System Overview Product Service Queries for scientific datasets from existing data system and formats results in OODT standard format Configurable with “device drivers” that know how to access special data systems Product Server Flat file accessor Unusual accessor Query Result

29 System Overview Web Interface Standard Apache web server with Java servlet engine and JSP processor User retrieves products with any old web browser—no client-side Java necessary Scientist Query Server Web browser Web Server Java Server Page Various systems GET http://oodt/search.jsp Execute Query HTML Response

30 System Overview Server-side Java Components Servlets –Replaces CGI scripts and programs –Fast …very fast JSP –Mix Java source text and HTML in same file Java Beans –Dynamic components to model business –Run-time managed by application server

31 System Overview Server-side Java Components CORBA –Standard interobject communication facility –Provides remote method invocation features XML –Self-describing data interchange format –Has many applications: Serialization of objects, bridge to CORBA interfaces, data/metadata packaging, direct-to-user presentation with XSL, and more


Download ppt "A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL -"

Similar presentations


Ads by Google