Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.

Similar presentations


Presentation on theme: "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore."— Presentation transcript:

1 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore General Atomics, Inc. San Diego Supercomputer Center moore@sdsc.edu http://www.npaci.edu/DICE/

2 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Topics Data management systems –Data collections, digital libraries Distributed data management –Data grids Persistent data management –Persistent archives Common infrastructure for data management

3 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Collections Define the context for describing a collection of digital entities –Context specified by metadata attributes –Provenance, origin of the digital entities –Administrative, location of the digital entities –Technical, purpose of the digital entities Support organization of attributes as hierarchy of sub-collections

4 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Digital Libraries Provide services on the data collection –Ingestion, loading of attribute values –Extensibility, definition of new attributes –Discovery, queries on attributes –Browsing, hierarchical listing –Presentation, formatting specified data models

5 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids Manage data in a distributed environment –Logical name space, provide global identifier –Data access, storage system abstraction –Replication, disaster back up –Uniform access, common API across file systems, archives, and databases –Single sign-on, authenticate across administration domains

6 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archives Manage technology evolution –Storage system abstraction, support data migration across storage systems –Information repository abstraction, support catalog migration to new databases –Logical name space, support global persistent identifier

7 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Storage Resource Broker Integration of collection-based management of digital entities, with –Remote data access through storage system abstraction –Catalog access through information repository abstraction –Automation through collection-owned data

8 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Capabilities Support legacy systems Integrate archives with file systems Share distributed data Maintain persistent collection Control data access

9 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Digital Entities Digital entities are “images of reality”, made of –Data, the bits (zeros and ones) put on a storage system –Information, the attributes used to assign semantic meaning to the data –Knowledge, the structural relationships described by a data model Every digital entity requires information and knowledge to correctly interpret and display

10 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Digital Entities Files –Text documents, images, spread sheets, binary files URLs Database query commands Databases

11 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Digital Entities Register digital entities into a catalog Assign metadata to describe each digital entity Separate management of the associated data bits from management of the metadata Support manipulation of each digital entity data type

12 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Technology Management Old Storage System New Operating System Old Application Digital Object Old Display System Wrap Storage SystemWrap Display System Migrate Encoding Format

13 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Preservation of Data Migration –Preserve the data bits –Preserve the digital entity name –Preserve the information and knowledge content for presentation by new applications

14 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Migration Advantages By migrating the digital entity encoding format to new standards, more sophisticated technologies can be applied to express the information and knowledge content inherent in collections of digital entities. Requires the ability to associate data model with digital entity

15 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Uniform API Provide common access semantics Map from the interface preferred by your application to the interfaces required by legacy storage systems

16 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Unix Shell Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Common APIs Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

17 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Discovery Transparencies Naming transparency - find a data set without knowing its name –Map from attributes to a global file name Location transparency - access a data set without knowing where it is –Map from global file name to local file name Access transparency - access a data set without knowing the type of storage system –Federated client-server architecture

18 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Unix Shell Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Transparencies Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

19 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Collection Maintain authenticity –Authenticate all accesses –Assign roles for access control lists (curation, write, annotate, read) –Manage audit trails of all operations Collection-owned data –All accesses through the data management system

20 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Unix Shell Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Persistency Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

21 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Preservation (Similar requirements to a data grid) Name transparency –Find a file by attributes (map from attributes to global name) Location transparency –Access a file by a global identifier (map from global to local file name) Access transparency –Use same API to access data in archive or file cache Authenticity –Disaster recovery, replicate data across storage systems –Audit and process management

22 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Unix Shell Java, NT Browsers Web WSDL GridFTP SDSC Storage Resource Broker & Meta-data Catalog Preservation Archives HPSS, ADSM, UniTree, DMF Databases DB2, Oracle, Postgres File Systems Unix, NT, Mac OSX Application HRM Access APIs Servers Storage Abstraction Catalog Abstraction Databases DB2, Oracle, Sybase C, C++, Libraries Logical Name Space Latency Management Data Transport Metadata Transport Consistency Management / Authorization-Authentication Prime Server Linux I/O DLL / Python

23 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Convergence of Technologies Data grids as basis for distributed data management –Federation of distributed resources –Creation of logical name space to automate discovery Distributed data collections –Discovery based on attributes –Distributed data storage systems Digital libraries –Development of services for manipulating, viewing data Persistent archives –Management of technology evolution

24 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Naming Ontologies Concept spaceDiscipline concepts Data gridGlobal Identifier CollectionDiscipline attributes Archive / file systemsLocal file name Data modelAttributes that describe data structure

25 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Knowledge Creation Roadmap Knowledge syntax (consensus) –RDF, XMI, Topic Map Knowledge management (recursive operations) –Oracle parallel database Knowledge manipulation (spatial/procedural rules) –Generation of inference rules and mapping to data models Knowledge generation (scalable inference engine) –Application of inference rules in inference engine

26 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Knowledge Based Data Grid Roadmap Attributes Semantics Knowledge Information Data Ingest Services ManagementAccess Services (Model-based Access) (Data Handling System - SRB) MCAT/HDF Grids XML DTD SDLIP XTM DTD Rules - KQL Information Repository Attribute- based Query Feature-based Query Knowledge or Topic-Based Query / Browse Knowledge Repository for Rules Relationships Between Concepts Fields Containers Folders Storage (Replicas, Persistent IDs)

27 National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Further Information http://www.npaci.edu/DICE


Download ppt "National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore."

Similar presentations


Ads by Google