A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann USC Center for Software Engineering Annual Research Review.

Slides:



Advertisements
Similar presentations
Cultural Heritage in REGional NETworks REGNET T1.4: Development of the system specification.
Advertisements

CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
1 CEOS/WGISS20 – Kyiv – September 13, 2005 Paul Kopp SIPAD New Generation: Dominique Heulet CNES 18, Avenue E.Belin Toulouse Cedex 9 France
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology Pasadena, California Facilitating Distributed.
Aug. 20, JPL, SoCalBSI '091 The power of bioinformatics tools in cancer research Early Detection Research Network, JPL Mentors: Dr. Chris Mattmann,
National Aeronautics and Space Administration Jet Propulsion Laboratory California Institute of Technology OODT and PDS: An Infrastructure for Management.
A New Computing Paradigm. Overview of Web Services Over 66 percent of respondents to a 2001 InfoWorld magazine poll agreed that "Web services are likely.
CSCI 578 Software Architectures Dr. Chris Mattmann Tuesday, January 13, 2009.
Tera/Petabyte data distribution architectures Chris A. Mattmann USC-CSE Annual Research Review Monday, June 15, 2015Monday, June 15, 2015Monday, June 15,
Mike Smorul Saurabh Channan Digital Preservation and Archiving at the Institute for Advanced Computer Studies University of Maryland, College Park.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
Dan Crichton April Topics Introduction – who am I? Architecture – what is means to me Challenges in Developing Architectures Reference Architecture.
IRS XML Standards & Tax Return Data Strategy For External Discussion June 30, 2010.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Digital Library Architecture and Technology
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
SCIENCE-DRIVEN INFORMATICS FOR PCORI PPRN Kristen Anton UNC Chapel Hill/ White River Computing Dan Crichton White River Computing February 3, 2014.
Digital Object Architecture
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
OASIS ebXML Registry Standard Open Forum 2003 on Metadata Registries 10:30 – 11:15 January 20, 2003 Kathryn Breininger The Boeing Company Chair, OASIS.
DISTRIBUTED COMPUTING
UDDI ebXML(?) and such Essential Web Services Directory and Discovery.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
CSCI 5980: From GPS and Google Earth to Spatial Computing Fall 2012 Midterm Presentation Chapter 7: Architectures Team 9: Thao Nguyen, Nathan Poole October.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Interfacing Registry Systems December 2000.
1 A National Virtual Specimen Database for Early Cancer Detection June 26, 2003 Daniel Crichton NASA Jet Propulsion Laboratory Sean Kelly NASA Jet Propulsion.
19/10/20151 Semantic WEB Scientific Data Integration Vladimir Serebryakov Computing Centre of the Russian Academy of Science Proposal: SkTech.RC/IT/Madnick.
Shakeh Elisabeth Khudikyan NASA Jet Propulsion Laboratory, California Institute of Technology A Look at Apache OODT Balance Framework.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MobileMAN Internal meetingHelsinki, June 8 th 2004 NETikos activity in MobileMAN project Veronica Vanni NETikos S.p.A.
Web Services Based on SOA: Concepts, Technology, Design by Thomas Erl MIS 181.9: Service Oriented Architecture 2 nd Semester,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
1 Registry Services Overview J. Steven Hughes (Deputy Chair) Principal Computer Scientist NASA/JPL 17 December 2015.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Overviews of the Library of Texas & ZLOT Project Dr. William E. Moen Principal Investigator.
System/SDWG Update Management Council Face-to-Face Flagstaff, AZ August 22-23, 2011 Sean Hardman.
Fire Emissions Network Sept. 4, 2002 A white paper for the development of a NSF Digital Government Program proposal Stefan Falke Washington University.
Djc -1 Daniel J. Crichton NASA/JPL 9 May 2006 CCSDS Information Architecture Working Group.
Harokopio University of Athens – Department of Informatics and Telematics HAROKOPIOUNIVERSITY A Distributed Architecture for Building Federated Digital.
1 Using the GEOSS Common Infrastructure in the Air Quality & Health SBA: Wildfire & Smoke Assessment Prepared by the GEOSS AIP-2 Air Quality & Health Working.
1 Steve Hughes Daniel J. Crichton NASA/JPL January 16, 2007 CCSDS Information Architecture Working.
ETICS An Environment for Distributed Software Development in Aerospace Applications SpaceTransfer09 Hannover Messe, April 2009.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Informatics and the caTissue Wrapper for the Early Detection Research Network Chris A. Mattmann, Ph.D. Senior Computer Scientist Instrument Software/ Science.
IPDA Architecture Project International Planetary Data Alliance IPDA Architecture Project Report.
National Aeronautics and Space Administration 1 CCSDS Information Architecture Working Group Daniel J. Crichton NASA/JPL 24 March 2005.
A service Oriented Architecture & Web Service Technology.
International Planetary Data Alliance Registry Project Update September 16, 2011.
IPDA Registry Definitions Project Dan Crichton Pedro Osuna Alain Sarkissian.
1 The XMSF Profile Overlay to the FEDEP Dr. Katherine L. Morse, SAIC Mr. Robert Lutz, JHU APL
Enhancements to Galaxy for delivering on NIH Commons
GEOSS Component and Service Registry (CSR)
Distributed Marine Data System:
Architecting Scientific Data Systems in the 21st Century
CSCI 578 Software Architectures
Wsdl.
Domain Specific Software Architectures for Science Lecture for Software Architectures USC 578 Dan Crichton April 2010.
Model-Driven Analysis Frameworks for Embedded Systems
The Movement Towards Grid Architectures in Planetary Science
Malte Dreyer – Matthias Razum
Presentation transcript:

A Software Architecture for Highly Data-Intensive Systems Chris A. Mattmann USC Center for Software Engineering Annual Research Review March 2004 Special thanks to Dan Crichton, Steve Hughes, and Sean Kelly for some of the slides!

Overview Motivation Problem Statement OODT: A Software Architecture and Middleware for Data-Intensive Systems Evaluation: Science Problems  Planetary Science  Cancer Research Conclusion

Motivation

Problem Statement Information Integration in Data-Intensive Systems  Needed to support data access, distribution, processing and retrieval across existing heterogeneous data sources NASA’s Planetary Data System NCI’s Early Detection Research Network Software and Techniques exist to perform Information Integration  But….. No Software Re-use No Design Methods to start from No mapping of integration techniques to software components, interaction mechanisms, or arrangements of components Lack of Re-use and software standards for information integration in data-intensive systems has forced systems to be “built from scratch”  Little or no interoperability with other software systems  Programmer almost always “in the loop”  New GDS proposal accompanies most new NASA mission proposals

Our Approach A Software Architecture for Data-Intensive Systems  Data Architecture Data Dictionary Resource Profiles  Software Architecture Components: Product Servers, Profile Servers, Query Servers Connector: Messaging Layer Configurations of Product/Profile/Query Servers..and a middleware implementation based on the software architecture  Middleware leverages existing distributed object middleware frameworks such as CORBA, RMI We’re currently working on a SOAP version Built and maintained at the Jet Propulsion Laboratory  Yes, the Mars folks Architecture+middleware = OODT (Object Oriented Data Technology) Middleware being developed at JPL Architecture being formalized at USC-CSE

Data Dictionary Common Data Model containing  Data Elements which the user is interested in querying for  Data Elements which the user would like to retrieve Challenge:  Integrate data sources linked in by exploiting the Data Dictionary structure Map common data model to data source models across data-intensive system  Use a common data element structure ISO Specification and Standardization of Data Elements  Handles the integration of data models across the system, but still need to integrate software interfaces

Resource Profiles Provides mechanisms for describing data systems, data products, etc including  Common data attributes using Dublin Core (I.e. Title, Author, Subject) data elements to describe electronic resources  Mechanisms for describing where the data is located and how to access it  Domain data elements that are useful for describing the product (i.e. TARGET_NAME, MISSION_NAME, INSTUMENT_NAME, etc) Enables “search and retrieval” of distributed data products  Searches to a Profile Server yields information regarding the characteristics of distributed resources (i.e. descriptive information about the product, access information, etc)

Resource Profiles Example “country = US and windspeed > 120” … … urn:eda:rmi:Western… country … US … state … WA CA … windspeed … … … … urn:eda:rmi:Southern… country … US … state … LA TX … windspeed … 1 89 … Matches!

Components Product Server  Responsible for abstracting heterogeneous data source interfaces Attach a Product Server to each data source that is integrated  Provides a common query interface across heterogeneous data sources Profile Server  Describe data resources using resource profiles Allow data resources to be discovered and located at query-time Query Server  Tie it all together  Uses Profile Servers to discover data resources which could potentially satisfy a query  Queries discovered data resources (such as Product Servers) and collects obtained data products to return to the user

Connectors Messaging Layer  Each OODT component registers itself with a Component Registry Allows Components to define and provide services Components defined by unique URNs  Transfers OODT Query Object containing OODT Style Query  (Keyword = Value) predicates joined by logical operators (AND, OR, etc) The result list to be populated

Configurations: Example

Configurations: Example (2)

Configurations: Example (3)

Planetary Science Planetary Data System  Official NASA “Active” Archive for all Planetary Data Data ingestion required as part of Announcement of Opportunity (AO) for a mission  9 Nodes with data located at discipline sites  Common Data Architecture  Different data systems located at the sites  Prior to October 2002, no ability to find and share data between PDS nodes Data distribution via CD ROM Limited electronic distribution

OODT PDS Deployment

Early Detection Research Network OODT’s success has lead to interagency agreements with both NIH and NCI OODT has provided the NCI with a bioinformatics infrastructure for sharing data across the nation  Currently deployed at 10 of 31 NCI Research Institutions for the Early Detection Research Network (EDRN)  Providing real-time access to distributed, heterogeneous databases  Created a national virtual repository for biospecimens (now a NCI Director Initiative)  Now integrating new datasets: validation studies, images, biomarkers, etc  Meet Federal security regulations  Operational September 2002 Same core software framework as deployed in planetary, earth and engineering

OODT EDRN Deployment

Conclusion OODT is…..  A novel software architecture to describe data intensive systems integration, search, retrieval and discovery of heterogeneous data stored in heterogeneous domain data sources  A reference implementation of above software architecture Java-based middleware C++. Perl, Python, PHP Client APIs  A process for annotating and creating standard metadata models to describe heterogeneous data based on data standards Dublin Core ISO-11179

Referred Papers Mattmann C, Ramirez P, Crichton D, and Hughes, J.S. Packaging Data Products using Data Grid Middleware for Deep Space Mission Systems. Accepted for Publication at the 8th International Conference on Space Operations, Montreal, Canada, Mattmann C, Freeborn D, Crichton D. Towards a Distributed Information Architecture for Avionics Data. In Proceedings of the 2nd International IADIS Conference on the World- Wide-Web and Internet, Volume II, pp Algarve, Portugal, Crichton D, Hughes, J.S., Kelly, S. A Science Data System Architecture for Information Retrieval. Clustering and Information Retrieval. Kluwer Academic Publishers. December Book Chapter on OODT Crichton D, Hughes, J.S., Kelly, S, Rameriz, P. A Component Framework Supporting Peer Services for Space Data Management IEEE Aerospace Conference. Big Sky, Montana. March Crichton D, Downing G, Hughes J. S, Kincaid H, Srivistava S. An Interoperable Data Architecture for Data Exchange in a Biomedical Research Network. 14th IEEE Symposium on Computer-Based Medical Systems. July Crichton, D., Hughes J. S, Hardman S, Kelly S. A Distributed Component Framework for Data Product Interoperability. 17th CODATA International Conference, Baveno, Italy. October Crichton, D., Hughes J. S, Kelly S, Hyon J. Science Search and Retrieval using XML. Second National Conference on Scientific and Technical Data, Washington D.C., National Academy of Sciences. March 2000.

Questions? Contacts  OODT Website:  Principal Investigator Dan Crichton  Co-Investigator Steve Hughes  Programmer/Research Grunt Me Thanks for your attention!

Backup Slides

Resource Profiles Example “country = US and windspeed > 120” … … urn:eda:rmi:Western… country … US … state … WA CA … windspeed … … … … urn:eda:rmi:Southern… country … US … state … LA TX … windspeed … 1 89 … Matches!

Object Oriented Data Technology Object-Oriented Data Technology (OODT)  Funded in 1998 by NASA’s Office of Space Science to develop a national software framework for sharing data across heterogeneous, distributed data repositories  Develop… a common data and software framework to enable data sharing across multiple science and engineering disciplines A reusable software architecture across data management projects  Reusable software components with common interfaces  Interfaces to enable new components to be plugged in  Mechanism to wrap legacy data system components with minimal impact  OODT should provide.. Science domain independence (use in engineering, science and biomedicine) Data location independence (describe what you want, not how/where to get it