Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center.

Similar presentations


Presentation on theme: "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center."— Presentation transcript:

1 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center 9500 Gilman Drive, La Jolla, CA 92093-0505 Phone: 858 534-5073 FAX: 858 534-5152 E-mail: moore@sdsc.edu http://www.npaci.edu/DICE/

2 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Topics Data Grid Requirements –Data management –Automation –Latency hiding Current technology –Distributed collections / digital libraries / data grids State of the art systems –Virtual data grids / persistent archives –Emerging Standards

3 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Management Environments Code development –Collaboration, check-out, versioning Run-time execution –High performance access, locking, latency hiding, automation, archival storage Publication –Discovery, consistency, persistent archives Are the capabilities required by all three environments compatible?

4 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Requirements are Met by Collection Technology Provide three levels of abstraction for data, information, and knowledge management (bits, tagged attributes, relationships) Automate access through use of information discovery on logical collections that span storage systems Manage latency by streaming, caching, replication, aggregation, remote proxies, staging Provide a persistent environment by building a consistent environment over evolving technology

5 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Current Technology Logical data collections –Storage Resource Broker / Metadata Catalog Abstract data management by building a data handling system that interoperates with storage systems (file systems, archives, databases) Abstract information management by building information catalog management that interoperates with information repositories (databases)

6 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center SDSC Storage Resource Broker & Meta-data Catalog SRB Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application C, C++, Linux I/O Unix Shell Dublin Core Resource, User Defined Application Meta-data Remote Proxies DataCutter Third-party copy Java, NT Browsers Web Prolog Predicate MCAT HRM

7 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Information Management Projects Digital Libraries –NSF Digital Library Initiative, Phase II - UCSB, Stanford –NLM Digital Embryo digital library - GMU –NPACI Digital Sky - Caltech 2MASS sky survey –California Digital Library - AMICO –NSF National SMETE Digital Library - UCAR / DLESE Grid Environments –NASA Information Power Grid - NASA Ames –DOE Data Visualization Corridor - LLNL –DOE Particle Physics Data Grid - Babar –NSF Grid Physics Network - U Fl Persistent Archives –NARA Persistent Archive –NHPRC - Scalable archives

8 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Data Grid - links multiple data collections Separate name spaces Separate administration domains Heterogeneous database instances Stage data from collection into the data grid Database ADatabase B Data grid The data grid is itself a collection that provides mechanisms to hide latency and provide a global namespace

9 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center State-of-the-art Data Management Provide knowledge management abstraction –Abstract the processes that create the derived data product (Virtual data grid) –Abstract the collection formation used to organize the derived data products (Persistent Archive) A persistent archive is a virtual data grid in which the derived data products are data collections

10 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Standards Object Management Group - OMG –Model Driven Architecture for platform independent models of services Platform dependent models transform an abstract representation into CORBA, Java, C, …. Builds upon Uniform Modeling Language (UML) Manages life cycle for software services –Common Warehouse Metamodel Provides abstract representation for collections that can be used to migrate collections to alternate databases Builds upon a subset of UML

11 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Standards World Wide Web Consortium - W3C –Semantic Web for natural language queries to collections. –Builds upon the DARPA Agent Markup Language for services, and logic manipulation languages (DAML-L, OIL) –Uses Resource Description Framework and XML

12 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Standards ISO –Topic maps manage relationships between concept spaces and collection attributes –Provide mechanisms to manage semantic interoperability Global Grid Forum –Provides authentication systems, data handling systems, execution environments

13 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Knowledge Based Data Grids Attributes Semantics Knowledge Information Data Ingest Services ManagementAccess Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)

14 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Intensive Computing Environment Group Staff Reagan Moore Chaitan Baru Sheau Yen Chen Charles Cowart Amarnath Gupta George Kremenek Bertram Ludäscher Richard Marciano Arcot Rajasekar Abe Singer Michael Wan Ilya Zaslavsky Bing Zhu Students - GSRA Martin Kuhl Liying Sui Yang Yu Valter Crescenzi Students - Undergrad Interns Peter Shin Roman Olshanowsky Shabbar Tambawala Pratik Mukhopadhyay +/- NN

15 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Further Information http://www.npaci.edu/DICE


Download ppt "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Reagan W. Moore San Diego Supercomputer Center."

Similar presentations


Ads by Google