Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology OODT and PDS: An Infrastructure for Management.

Similar presentations


Presentation on theme: "National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology OODT and PDS: An Infrastructure for Management."— Presentation transcript:

1 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology OODT and PDS: An Infrastructure for Management of Distributed Planetary Science Repositories and Databases October 14, 2004 Steve Hughes, Principal Computer Scientist Dan Crichton, Principal Computer Scientist Cyndi Atkinson, Senior Computer Scientist NASA Jet Propulsion Laboratory

2 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 2 Agenda F NASA Data Management Challenges F Object Oriented Data Technology ä Background ä Architecture and Design of Software Framework ä OODT Components F NASA Planetary Data System F Other OODT Deployments in Earth Science and Biomedicine F Wrap Up

3 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 3 Challenges to Data Management Archiving, Search, Retrieval and Integration F Space scientists cannot easily locate or use data across the hundreds if not thousands of autonomous, heterogeneous, and distributed data systems currently in the Space Science community. F Heterogeneous Systems ä Data Management - RDBMS, ODBMS, HomeGrownDBMS, BinaryFiles ä Platforms - UNIX, LINUX, WIN3.x/9x/NT, Mac, VMS, … ä Interfaces - Web, Windows, Command Line ä Data Formats - HDF, CDF, NetCDF, PDS, FITS, VICR, ASCII,... ä Data Volume - KiloBytes to TeraBytes F Heterogeneous Disciplines ä Moving targets and stationary targets ä Multiple coordinate systems ä Multiple data object types (images, cubes, time series, spectrum, tables, binary, document) ä Multiple interpretations of single object types ä Multiple software solutions to same problem ä Incompatible and/or missing metadata

4 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 4 Evolution of Data Systems (Trying to make order out of entropy) Data System Evolution Local Database - Local Tools - No Data Sharing between Centers - No Common Data Elements Limited Data Sharing - Manual Data Sharing - Manual Correlation - Export/Import Data - Limited CDEs Full Data Sharing - Location Independence - Data Interchange - Data Sharing - Common CDEs between centers - Heterogeneous Systems Locally Centralized Data Interoperable & Distributed Databases NASA Data Architecture Single MIssion Multi-Center, Multi Mission Environments

5 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 5 Object Oriented Data Technology Started in 1998 as a research and development task funded at JPL by the Office of Space Science to address Application of Information Technology to Space Science Provide an infrastructure for distributed data management Research methods for interoperability, knowledge management and knowledge discovery Develop software frameworks for data management to reuse software, manage risk, reduce cost and leverage IT experience OODT Initial focus Data archiving – Manage heterogeneous data products and resources in a distributed, metadata-driven environment Data location and discovery – Locate data products across multiple archives, catalogs and data systems Data retrieval – Retrieve diverse data products from distributed data sources and integrate

6 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 6 What does OODT solve? F Tie together loosely coupled distributed heterogeneous data systems into a virtual data grid (Kesselman, Chervenak) ä Data Production ä Data Distribution ä Data Discovery (including query optimization across highly distributed systems) ä Data Access F An architectural approach first, an implementation second ä Adapt to different distributed computing deployments ä Promotes a REST-style architectural pattern for search and retrieval F A work flow infrastructure for managing data products F Scalability in linking together large, distributed data sets

7 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 7 Attributes of Data Grid Architectures F Coordinate resources that are not subject to centralized control F Use open source, general purpose protocols and interfaces for linking systems together F Delivers and addresses non-trivial differences in qualities of services

8 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 8 Architectural Focus F Process Architecture ä Enable data discovery, data capture, data distribution/exchange across highly distributed environments ä Metadata-driven architecture F Data Architecture ä Common XML schema for managing information about data resources ä Common XML schema for messaging between distributed services ä Reuse of domain models within architecture F Technology Architecture ä Information management services that sit on top of traditional middleware F Archive Service (Product Mgmt), Profile (Registry) Services, Product (Repository) Services, Query Services, Domain Services ä Core services that publish well known interfaces using standard protocols (HTTP, TCP/IP, SOAP, CORBA, …) ä The architecture fits with current middleware technologies (i.e. Web Services, RMI/J2EE, CORBA, Globus, …)

9 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 9 Distributed Architecture 3. Repositories for storing and retrieving many types of data 1. Science data tools and applications use “APIs” to connect to a virtual data repository Visualization Tools Analysis Tools OODT Reusable Data Grid Framework OODT Reusable Data Grid Framework Mission Data Repositories Mission Data Repositories OODT API OODT API 2. Middleware creates the data grid infrastructure connecting distributed heterogeneous systems and data Biomedical Data Repositories Biomedical Data Repositories Engineering Data Repositories Engineering Data Repositories Web Search Tools OODT API OODT API OODT API OODT API

10 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 10 OODT/Science Web Tools Archive Client OBJECT ORIENTED DATA TECHNOLOGY FRAMEWORK Profile XML Data Navigation Service Data System 2 Data System 1 Other Service 1 Other Service 2 Query Service Product Service Profile Service Archive Service Bridge to External Services A data grid framework for capturing, locating and accessing distributed resources across geographically distributed data environments. OODT Framework

11 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 11 OODT Interoperability Layers

12 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 12 End-to-End Space Architecture Source: A. Hooke, NASA/JPL A Ground Tracking Network One or More Spacecraft An Instrument Control Center A Spacecraft Control Center A Science Facility A Space Tracking Network Commodity Space Communications Systems Commodity Space Navigation Systems One or More Instruments

13 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 13 End-to-End Space Information Architecture External Science Community Data Acquisition and Command Mission Operations Instrument /Sensor Operations Data Archive Data Processing Data Analysis and Modeling Data/Information Distribution Science Team Relay Satellite Spacecraft / lander Spacecraft and Scientific Instruments Science Data Modeling Data Processing & Control Data Archival & Distribution Operations Planning Science Planning Data Access Planetary Science Infrastructure Data Observation

14 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 14 OODT Software Components F Profile Server – A server-based registry that is able to either serve local XML profiles or plug-into an existing catalog. This component provides resource discovery. F Product Server – A server component that plugs into existing repositories and serves products. This includes translation serves, etc F Catalog and Archive Server – Transaction-based server that catalogs and archives products providing profile and product servers for discovery and distribution F Query Server – Provides query management across distributed services to enable discovery.

15 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 15 Profile Servers F A profile is a set of resource definitions implemented in XML for data products residing in one or more distributed systems F Profile servers are distributed servers that manage XML profile definitions as a set of cooperating registries F Profile servers communicate via XML-over-standard middleware F Developed Java classes that map XML profiles to a Java object Profile Distributed Node Architecture

16 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 16 Profile Server Architecture handlers ProfileHandler «interface» +findProfiles(q: XMLQuery): List +get(id: String): Profile Server Profile Client 1 jpl.eda.profile your.server MyHandlerMyOtherHandler ProfileManager «interface»

17 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 17 Profile Model for Describing Distributed Resource

18 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 18 XML Profile Definition <!ELEMENT profiles (profile*)> <!ELEMENT profile (profAttributes, resAttributes, profElement*)> <!ELEMENT profAttributes (profId, profVersion?, profType, profStatusId, profSecurityType?, profParentId?, profChildId*, profRegAuthority?, profRevisionNote*, profDataDictId?)> <!ELEMENT resAttributes (Identifier, Title?, Format*, Description?, Creator*, Subject*, Publisher*, Contributor*, Date*, Type*, Source*, Language*, Relation*, Coverage*, Rights*, resContext+, resAggregation?, resClass, resLocation*)> <!ELEMENT profElement (elemId?, elemName, elemDesc?, elemType?, elemUnit?, elemEnumFlag, (elemValue* | (elemMinValue, elemMaxValue)), elemSynonym*, elemObligation?, elemMaxOccurrence?, elemComment?)> Profile Housekeeping Information Common Attributes for describing resources Domain Metadata

19 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 19 Searching Resource Descriptions to Find Resources F “country = US and windspeed > 120” … … urn:eda:rmi:Western… country … US … state … WA CA … windspeed … 3 146 … … … urn:eda:rmi:Southern… country … US … state … LA TX … windspeed … 1 89 … Matches!

20 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 20 Scalability of Searching Distributed Resources Profile Server A Profile Server C Profile Server F Profile Server B Profile Server E Resource Descriptions Distributed Query Resource Descriptions Resource Descriptions Resource Descriptions Resource Descriptions Resource Descriptions Resource Descriptions Distributed Data Products Distributed Data Products Distributed Data Products Distributed Data Products

21 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 21 Product Server Architecture Distributed Product Servers HTTP, IIOP, Java, C++ APIs THEMIS Data Set Data Source Interface For Dynamically Loaded Query Handlers Java Server Framework File system access GRS Data Set Java Server Framework Database access XMLQUERY Data Source Interface For Dynamically Loaded Query Handlers HTTP, IIOP, Java, C++ APIs XMLQUERY Distributed Data Repositories

22 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 22 Catalog and Archive Server F Transaction-based catalog and archive component ä Distributed, clients initiate a transaction with a server ä Server executes a series of configurable steps (rule-base) ä If a step is not satisfied, the archive package will be rejected ä A registry is constructed based on the metadata in the package F “CAS” plugs into OODT by providing both a profile and product server interface so that the metadata that is catalogued can be searched and the product can be retrieved

23 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 23 The Search Algorithm :Query Server :Profile Server PS2: Product Server target = Jupiter urn:eda:rmi:PS2 target = Jupiter target = Jupiter, urn:eda:rmi:PS2 F First, do a profile query to resolve resources that could provide the data ä Tells you where to look ( In the form of a URI) F Then, do a product query based on the profile query results to get the data ä Actually retrieves data ( In a MIME-compliant format)

24 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 24 Conceptual Search/Retrieval Architecture Query Server Node 1 Profile Server XMLQuery Resource Catalogs QueryClient Web server Plugins Web Server Repository Node Product Server XMLQuery Node 1 Products Node N Products Web I/F Desktop I/F XMLQuery Name Server Repository Node Product Server Node 2 Products Node 1 Profile Server Node 1 Profile Server Node 1 Profile Server Archive Node Product Server … Name Server

25 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 25 Software Implementation F OODT is Open Source (http://www.openchannelsoftware.com)http://www.openchannelsoftware.com F Developed using open source software (i.e. Java/J2EE and XML) F Implemented reusable, extensible Java-based software components F Provided messaging as a “plug-in” component that can be replaced independent of the other core components. Messaging components include: ä CORBA, Java RMI, JXTA, Web Services, etc F Provided common client APIs in Java, C++ and HTTP F Simple installation on a variety of platforms (Windows, Unix, Mac OS X, etc) F Used international data architecture standards ä ISO/IEC 11179 – Specification and Standardization of Data Elements ä Dublin Core Metadata Initiative ä W3C’s Resource Description Framework (RDF) from Semantic Web Community F Used common XML schemas for all data exchanges

26 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 26 Operational Tools F Simple installation F Remote tools to enable debugging, configuration and software upgrades ä With role-based access controls F Monitoring to verify health of distributed infrastructure F Online Documentation for ä Installation Procedures ä APIs ä White Papers, etc ä Training

27 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 27 What is the PDS? F The Planetary Data System (PDS) is the official planetary science data archive for NASA. F PDS is chartered to ensure that planetary data are archived and available to the scientific community. ä Publish and disseminate documented data sets for use in scientific analysis. ä Work with projects to help design, generate, and validate data products for placement in archive ä Develop and maintain archive data standards to ensure future usability. ä Provide expert scientific help to the user community. F PDS is a distributed system designed to optimize scientific oversight in the archiving process.

28 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 28 What has the PDS Accomplished? F Produced a high-quality peer-reviewed archive of Solar System Exploration Data ä Stored for long-term viability ä Described by metadata ä Distributed either online or on CD media F Developed a robust standards architecture ä Planetary Science Data Dictionary - Provides the “domain of discourse” for the planetary science community. ä Planetary Community Model - Provides formalized descriptions of the entities and their relationships within the planetary science community. F Developed science driven management structure ä Responsive to changing mission project environment through distributed, science discipline oriented nodes.

29 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 29 PDS Nodes and Institutions (Silos) NAIF/JPL Small Bodies/UMD Atmospheres/New Mexico State Geosciences/Washington University Planetary Plasma/UCLA Rings/Ames Radio Science/Stanford Central Node/JPL Imaging/USGS Imaging/JPL

30 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 30 Archive Growth for Planetary Missions

31 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 31 OODT Insertion in the PDS F Focused research activity on information technology in support of space science data systems ä Providing a long term architecture to improve the ability for scientists to retrieve data within the PDS F Refocus the problem away from technology solutions F Provide and leverage a metadata infrastructure ä Providing new solutions for data management in order to access and correlate heterogeneous data products archived in distributed heterogeneous data systems ä Reusing a metadata infrastructure that exists ä Supporting the PDS distributed node architecture F 2005 focus is on infusing the Catalog and Archive Server component to handle the increase in volume from future missions

32 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 32 Planetary Science: Mars Odyssey F Pre-Oct 2002, no unified view across distributed operational planetary science data repositories ä Science data distributed across the country F Planetary data archive increasing from 4 TBs in 2001 to 120+ TBs in 2007 ä Traditional distribution infeasible due to cost and system constraints ä Mars Odyssey could not be distributed using traditional method F OODT has provided the technology for NASA’s planetary data management infrastructure to ä Support online distribution of science data to planetary scientists (up to 500 MB products) ä Enable interoperability between nine institutions ä Support real-time access to data products ä Provided uniform software interfaces to all Mars Odyssey data allowing scientists and developers to link in their own tools ä Operational October 1, 2002 ä Saved NASA over $3M in physical media distribution OODT has changed the paradigm for planetary science data distribution! 2001 Mars Odyssey

33 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 33 OODT Planetary Science Deployed Data Infrastructure

34 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 34 PDS Search/Retrieval Implementation PDS/OODT Profile Servers are used in the population of the PDS Data Set View results page as well as the Data Set Information page. The profile servers are used to determine the relationships between the various search parameters to produce a list of results and to extract relevant information for the specified data set.

35 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 35 PDS Search/Retrieval Implementation PDS/OODT Product Servers are used to retrieve data from the appropriate node’s collections and provide it to the user. In this case, the Spice data are located at the Navigation and Ancillary Information Facility or NAIF node.

36 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 36 Earth Science: SeaWinds Mission F OODT provided the software component infrastructure for capturing and managing distributed data products and the associated data processing (workflow) F Used OODT’s Catalog and Archive Service Component F Provided “lights out” operations F Implementation is basis for future Earth missions including ä Ocean Vector Winds ä InSAR (proposed start 2005) F Retrofitted QuikSCAT based on SeaWinds success F OODT was delivered for SeaWinds operations beginning Dec 2002 F Same core software framework SeaWinds on ADEOS II (Launched Dec 2002)

37 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 37 Bioinformatics: Cancer Research F OODT’s success has lead to interagency agreements with both NIH and NCI F OODT has provided the NCI with a bioinformatics infrastructure for sharing data across the nation ä Currently deployed at 10 of 31 NCI Research Institutions for the Early Detection Research Network (EDRN) ä Providing real-time access to distributed, heterogeneous databases ä Created a national virtual repository for biospecimens (now a NCI Director Initiative) ä Now integrating new datasets: validation studies, images, biomarkers, etc ä Meet Federal security regulations ä Operational September 2002 F Funded through 2005 (w/ renewal planned through 2010) F Same core software framework as deployed in planetary, earth and engineering Cancer Biomarkers Group Division of Cancer Prevention

38 National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology 38 Wrap up F More information at http://oodt.jpl.nasa.govhttp://oodt.jpl.nasa.gov F Software available at http://www.openchannelsoftware.comhttp://www.openchannelsoftware.com F Many papers and presentations are available F Think “Architectural Style” for building “Data Intensive Systems” ä How do you address resource discovery? ä How do you connect highly distributed applications and systems? ä How do you deal with different information models? ä How do you deal with information management and discovery across highly diverse computing environments? ä How does information management fit into Grid Computing? ä How do you optimize this environment F These are areas of expertise at JPL which have been applied to planetary, earth, engineering, biomedicine, defense, etc… F Questions: Dan Crichton (818.354.9155) Dan.Crichton@jpl.nasa.govDan.Crichton@jpl.nasa.gov Steve Hughes (818.354.9338) Steve.Hughes@jpl.nasa.govSteve.Hughes@jpl.nasa.gov


Download ppt "National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology OODT and PDS: An Infrastructure for Management."

Similar presentations


Ads by Google