A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL -

Slides:



Advertisements
Similar presentations
Manage Scientific Metadata Using XML Yang, R., M. Kafatos and X. Wang, Managing Scientific Metadata Using XML, IEEE Internet Computing, Volume: 6, Issue:
Advertisements

A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Web Service Architecture
COM vs. CORBA.
An Operational Metadata Framework For Searching, Indexing, and Retrieving Distributed GIServices on the Internet By Ming-Hsiang.
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
2 Object-Oriented Analysis and Design with the Unified Process Objectives  Explain how statecharts can be used to describe system behaviors  Use statecharts.
DCS Architecture Bob Krzaczek. Key Design Requirement Distilled from the DCS Mission statement and the results of the Conceptual Design Review (June 1999):
1 Build a Web Application on J2EE. 2 J2EE Scenario Client – Web Server – EIS Resources Client – Web Server – EIS Resources Client – Application Server.
Satzinger, Jackson, and Burd Object-Orieneted Analysis & Design
Software Engineering Module 1 -Components Teaching unit 3 – Advanced development Ernesto Damiani Free University of Bozen - Bolzano Lesson 2 – Components.
Chapter 14 Database Connectivity and Web Technologies
WWW and Internet The Internet Creation of the Web Languages for document description Active web pages.
2440: 141 Web Site Administration Web Server-Side Programming Professor: Enoch E. Damson.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
ODBC Open DataBase Connectivity a standard database access method developed by Microsoft to access data from any application regardless of which database.
GMD German National Research Center for Information Technology Innovation through Research Jörg M. Haake Applying Collaborative Open Hypermedia.
A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann USC Center for Software Engineering Annual Research Review.
INTRODUCTION TO WEB DATABASE PROGRAMMING
Chapter Lead Black Slide Powered by DeSiaMore Powered by DeSiaMore.
Chapter 33 CGI Technology for Dynamic Web Documents There are two alternative forms of retrieving web documents. Instead of retrieving static HTML documents,
Adapting Legacy Computational Software for XMSF 1 © 2003 White & Pullen, GMU03F-SIW-112 Adapting Legacy Computational Software for XMSF Elizabeth L. White.
EARTH SCIENCE MARKUP LANGUAGE “Define Once Use Anywhere” INFORMATION TECHNOLOGY AND SYSTEMS CENTER UNIVERSITY OF ALABAMA IN HUNTSVILLE.
Chapter 16 The World Wide Web. 2 The Web An infrastructure of information combined and the network software used to access it Web page A document that.
16-1 The World Wide Web The Web An infrastructure of distributed information combined with software that uses networks as a vehicle to exchange that information.
CPS120: Introduction to Computer Science The World Wide Web Nell Dale John Lewis.
A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL -
Chemical Toxicity and Safety Information System Shuanghui Luo Ying Li Jin Xu.
SITools Enhanced Use of Laboratory Services and Data Romain Conseil
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
Introduction to MDA (Model Driven Architecture) CYT.
Fundamentals of Database Chapter 7 Database Technologies.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
Extensible Markup Language (XML) Extensible Markup Language (XML) is a simple, very flexible text format derived from SGML (ISO 8879).ISO 8879 XML is a.
SharePoint 2010 Search Architecture The Connector Framework Enhancing the Search User Interface Creating Custom Ranking Models.
Webcommerce Computer Networks Webcommerce by Linnea Reppa Douglas Martindale Lev Shalevich.
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
Lead Black Slide. © 2001 Business & Information Systems 2/e2 Chapter 5 Information System Software.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
Dan Crichton/JPL Steve Hughes/JPL Sean Kelly/UTA Sean Hardman/JPL
NOVA Networked Object-based EnVironment for Analysis P. Nevski, A. Vaniachine, T. Wenaus NOVA is a project to develop distributed object oriented physics.
1 MSCS 237 Overview of web technologies (A specific type of distributed systems)
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
The CERA2 Data Base Data input – Data output Hans Luthardt Model & Data/MPI-M, Hamburg Services and Facilities of DKRZ and Model & Data Hamburg,
NOVA A Networked Object-Based EnVironment for Analysis “Framework Components for Distributed Computing” Pavel Nevski, Sasha Vanyashin, Torre Wenaus US.
Dan Crichton, Manager, Enterprise Data Architecture Task, Principal Investigator Object Oriented Data Technology Task Steve Hughes, Lead System Engineer,
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
1 G52IWS: Web Services Chris Greenhalgh. 2 Contents The World Wide Web Web Services example scenario Motivations Basic Operational Model Supporting standards.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
© FPT SOFTWARE – TRAINING MATERIAL – Internal use 04e-BM/NS/HDCV/FSOFT v2/3 JSP Application Models.
August 2003 At A Glance The IRC is a platform independent, extensible, and adaptive framework that provides robust, interactive, and distributed control.
12 Oct 2003VO Tutorial, ADASS Strasbourg, Data Access Layer (DAL) Tutorial Doug Tody, National Radio Astronomy Observatory T HE US N ATIONAL V IRTUAL.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
A Data Architecture for Interoperable Space Sciences Data Systems 1st Annual ERDN Workshop Early Detection Research Network Chicago, IL September 27,
Distributed Data Servers and Web Interface in the Climate Data Portal Willa H. Zhu Joint Institute for the Study of Ocean and Atmosphere University of.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Your Interactive Guide to the Digital World Discovering Computers 2012 Chapter 13 Computer Programs and Programming Languages.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
PDAP Query Language International Planetary Data Alliance
Ch > 28.4.
The Re3gistry software and the INSPIRE Registry
Presentation transcript:

A Multi-Discipline Metadata Registry for Science Interoperability J. Steven Hughes/JPL - Daniel J. Crichton/JPL - Jason J. Hyon/JPL - Sean C. Kelly/UTA - Open Forum on Metadata Registries January 17-21, 2000

A Multi-Discipline Metadata Registry for Science Interoperability Problem Statement System Overview Profile Development Conclusion and Outstanding Issues

Problem Statement Space scientists can not easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community. Heterogeneous Systems Data Management - RDBMS, ODBMS, HomeGrownDBMS, BinaryFiles Platforms - UNIX, LINUX, WIN3.x/9x/NT, Mac, VMS, … Interfaces - Web, Windows, Command Line Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII,... Data Volume - KiloBytes to TeraBytes Heterogeneous Disciplines Moving targets and stationary targets Multiple coordinate systems Multiple data object types (images, cubes, time series, spectrum, tables, binary, document) Multiple interpretations of single object types Multiple software solutions to same problem. Incompatible and/or missing metadata

Problem Statement Current Capabilities - Example Systems Astrobrowse - Web-based CGI-Script Distributed searches for products across hundreds of Astrophysics sites Search limited to target identifiers and RA/DEC specified areas Limited to stationary (identified) targets Not readily extensible to other coordinate systems or moving targets PDSBrowse - Web-based CGI-Scripts Searches for data sets and resources across eight distributed nodes Search allowed on any defined attribute Limited to identified targets Product searches relegated identified resources (catalogs)

Problem Statement Current Capabilities - Example Queries Current Planetary - PDSBrowse - Eight nodes Constrain on “image” and read description of the 15 resulting resources. Link to selected resources and perform search for image. Astrophysics - ASTROBrowse - Hundreds of nodes Submit query for target = Jupiter “Jupiter” might not be associated with all images containing Jupiter. Object name might have aliases Convert images to usable format. Requested OmniBrowse Find science_products where data_object_type = image and target = “JUPITER” Find science_products where data_object_type = image and target = all_aliases (“JUPITER”) and convert_to(JPEG)

Proposed Solution Encapsulate individual data systems. (Hide uniqueness.) Communicate using metadata. (Provide metadata with data.) Enable interoperability based on metadata compatibility. Refocus problem on metadata development.

Proposed Solution (cont) Domain independent data structures –XML - Standard interchange language –Metadata management –Message passing Domain independent system infrastructure –CORBA for interoperability between computer systems and languages –Message passing to simply interface design –Standardized reusable server components Resource descriptions (Metadata) –Resource profiles –Domain data dictionaries –Domain data models Object_Oriented Data Technology Task (OODT) –Domain independent data management infrastructure

System Overview Object Oriented Data Technology Framework SeaWinds StagingOODT ServerPDS StagingPTI Staging Profile Server Query Server Archive Server Product Server Archive Server Profile Server SybaseOracle Profile Server PDS Systems Product Server XML Scientist Web Server

System Overview Profile Service Describes a scientific data system via XML –Available datasets and metadata –Types of resources and where they’re located Optionally describes other profile servers Profile Server XML Data system 1 Data system 2 Profile Server XML Profile Server

System Overview Query Service Knows how to “crawl” through servers to produce a result –Crawls through profiles to discover other profiles and product servers –Crawls through product servers to display available products Accessible through CORBA API or through web browser

System Overview Object Oriented Data Technology Framework

Profile Development Objective Design and develop an architecture that will manage the meta-data necessary for identifying and locating science data resources across distributed heterogeneous data systems.

Profile Development Approach Choose a common interchange format. Develop a domain generic language. Implement domain specific instances. Model the domain. Capture the meta-data. Develop system to manage the results.

Profile Development Choose a common interchange format XML eXtensible Markup Language More expressive than HTML More simple than SGML A meta-language used to define domain languages. XSIL - eXtensible Scientific Interchange Language. XIL - Instrument control language. Wide acceptance as an interchange format. Electronic data interchange (EDI) standard. Space sciences data systems

Profile Development Develop a domain generic language Define a generic structure (XML DTD) that can describe heterogeneous domain-specific resources. Profile - A resource description with sufficient information to determine if the resource satisfies a query. Profile elements name, syntax, unit, instance, meaning, alias, … encodes all domain attributes and their values specific to this resource Resource attributes - id, title, discipline, location_id, … Profile attributes - id, title, desc, type, data_dictionary_id, …

Profile Development Develop a domain generic language prof.dtd <!ELEMENT PROFILES (PROFILE+)> <!ELEMENT PROFILE (PROFILE_ATTRIBUTES, RESOURCE)> <!ATTLIST PROFILE PROFILE_ID CDATA #REQUIRED > <!ELEMENT PROFILE_ATTRIBUTES (ID, TITLE*, DESC*, TYPE*, STATUS_ID*, SECURITY_TYPE*, PARENT_ID*, CHILD_ID*, REVISION_NOTE*, DATA_DICTIONARY_ID*)> <!ELEMENT RESOURCE (RESOURCE_ATTRIBUTES, PROFILE_ELEMENT*)> <!ELEMENT RESOURCE_ATTRIBUTES (RESOURCE_ID, RESOURCE_TITLE, RESOURCE_DISCIPLINE, RESOURCE_AGGREGATION, RESOURCE_CLASS, RESOURCE_LOCATION_ID, RESULT_MIME_TYPE)> <!ELEMENT PROFILE_ELEMENT (ELEMENT_NAME, ELEMENT_MEANING*, ELEMENT_ALIAS*, VALUE_SYNTAX*, VALUE_UNIT*, (VALUE_INSTANCE | (MINIMUM_VALUE, MAXIMUM_VALUE))*)>

Profile Development Develop a domain generic language Specialize the profile class Profile - One profile to one resource (e.g. catalog) Inventory - One profile to many resources (e.g. catalog entries) Minimized profile element Dictionary - One profile to one discipline Comprehensive profile elements aliases meanings

Profile Development Develop a domain generic language Profile element hierarchy (given a domain data dictionary) Dictionary keywords - all keywords in domain keyword values - union of all values in domain e.g. TARGET_NAME = {ADRASTEA, …, VENUS} Profile keywords - all keywords associated with a domain resource keyword values - union of all values in domain resource e.g. TARGET_NAME = {MARS, DEIMOS, PHOBOS} Inventory keywords - keywords associated with inventory item. Keyword values - values for inventory item. E.g. TARGET_NAME = {MARS}

Profile Development Implement domain specific instances Apply domain generic language to specific domain. E.g. Space/Earth Science data and other resources. Model the domain Planetary Science Data Dictionary Planetary Data System Data Model Entity-Relationship model Capture the meta-data Extracted from PDS metadata repository

Profile Development Implement domain specific instances Model the domain (from the start) Planetary Science Data Dictionary Over 1000 Data Elements spanning Planetary Science Nomenclature Standard Meaning, type, ranges, enumerated values Planetary Science Data Model Developed as Planetary Science enterprise E/R model Planetary Science Entities - Spacecraft, Instruments Science Data Entities - Data Products Data Organization Entities - Volumes Management Entities - Nodes, Personnel Implemented as Data Set Catalog in an RDBMS

Profile Development Planetary Science Standards Architecture Metadata and data standards All data products are labeled Data Model relates entities

Profile Development Implement domain specific instances Profile Example - PDS Distributed Inventory System PROFILE_PDS_DIS_V1.3.n Planetary Data System - Distributed Inventory System - Profile V1.0 This profile describes the Planetary Data System (PDS) Distributed Inventory System (DIS)... PROFILE OODT_PDS_DATA_SET_DD_V1.0 PDS_DIS_V1.3.n Planetary Data System - Distributed Inventory System PDS GRANULE+ INVENTORY text/html...

Profile Development Implement domain specific instances Profile Example (cont) - PDS Distributed Inventory System … DATA_OBJECT_TYPE The data_object_type element provides the type... ENUMERATION N/A IMAGE DATA_SET_NAME The data_set_name element identifies a PDS data set. -- example... ENUMERATION N/A VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL... VO2 MARS RADIO SCIENCE SUBSYSTEM RESAMPLED LOS... TARGET_NAME The target_name element provides the names of the targets... ADS.OBJECT_ID ENUMERATION N/A IDA JUPITER

Profile Development Implement domain specific instances Inventory Example - PDS Data Set VO1/VO2-M-VIS-5-DIM-V1.0 VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL... PDS GRANULE+ DATA text/html DATA_SET_NAME VO1/VO2 MARS VISUAL IMAGING SUBSYSTEM DIGITAL IMAGING MODEL... DATA_OBJECT_TYPE TABLE TARGET_NAME MARS VOLUME_ID VO_ VO_2014

Conclusion Profile Development - Review Choose a common interchange format. (XML) Develop a domain generic language. (X2PL) (XML eXtensible Profile Language) Implement domain specific instances. (Science Resource Profiles) Develop system to manage the results. (Profile Servers)

Conclusion Outstanding Issues Metadata development remains the problem Data Dictionary interoperability is required NASA Data Entity Dictionary Specification Language (DEDSL) semantics being used to promote compatibility. (Subset of ) Need XML specification Data Model interoperability is required

Backup Slides

System Overview Product Service Queries for scientific datasets from existing data system and formats results in OODT standard format Configurable with “device drivers” that know how to access special data systems Product Server Flat file accessor Unusual accessor Query Result

System Overview Web Interface Standard Apache web server with Java servlet engine and JSP processor User retrieves products with any old web browser—no client-side Java necessary Scientist Query Server Web browser Web Server Java Server Page Various systems GET Execute Query HTML Response

System Overview Server-side Java Components Servlets –Replaces CGI scripts and programs –Fast …very fast JSP –Mix Java source text and HTML in same file Java Beans –Dynamic components to model business –Run-time managed by application server

System Overview Server-side Java Components CORBA –Standard interobject communication facility –Provides remote method invocation features XML –Self-describing data interchange format –Has many applications: Serialization of objects, bridge to CORBA interfaces, data/metadata packaging, direct-to-user presentation with XSL, and more