Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center

Similar presentations

Presentation on theme: "Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center"— Presentation transcript:

1 Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center

2 Key Data Management Standards Data sharing - data grids Storage Resource Broker - SRB Enables each community to use their preferred API Supports federation between data grids Integrated Rule Oriented Data System - iRODS Automates execution of management policies Data publication - digital libraries DSpace and Fedora support metadata standards OAI-PMH METS Data preservation - persistent archives Integrated Rule Oriented Data System - iRODS RLG/NARA Trusted Repository Assessment criteria OAIS ISADG metadata DFDL data format characterizatioin

3 Data Standards What are your plans of supporting these? International collaboration on the development of iRODS. Integration with other data management solutions including LOCKSS, IBP/Lstore, DataVerse Which is in your opinion the area which lacks standardization most (either because of the absence of standardization or because of insufficient standards)? Need progress in representation information Need DFDL standard

4 Data Management Standards Which (future) standards are seen as important? Virtual Machine technology Workflow virtualization How close is your implementation to the (published) standard(s)? Most standards provide access methods, but not data management Port access standards as they become heavily used Which are the open source extensions to employed standards you had to make and why? Generic interface for manipulating structured information Need the ability to query a structured information resource to read and write internal information Generic interface for characterizing data management policies Need the ability to virtualize management policies across administrative domains

5 Grid Infrastructure Interoperation Challenges Virtualization of trust Support for multiple uthentication and authorization mechanisms Data management virtualization Characterization of management policies as server-side workflows Virtualization of workflows Ability to migrate workflows between client-side and server-side What do you suggest are the best ways to tackle these problem areas? Bottom-up interoperability development by the principal software system developers Roadmap document Wiki - What is your funding status in a mid- and long-term perspective? Sustained funding for the next three years (NSF, NARA)

6 William Charles Wentworth ( ) Noted Australian explorer and statesman Ancestry 8,979 ancestors 84,628 descents from Charlemagne Cousins Queen Elizabeth10th cousin, 3 removes George Washington11th cousin, 3 removes Reagan Wentworth Moore10th cousin, 4 removes

7 Data Grid Evolution Data grids Infrastructure independence Data sharing through data and trust virtualization SRB - Storage Resource Broker Rule-based data grids Automation of management policies Management virtualization Open source software iRODS - integrated Rule-Oriented Data System

8 Data Management Applications Data grids Share data - organize distributed data as a collection Digital libraries Publish data - support browsing and discovery Persistent archives Preserve data - manage technology evolution Real-time sensor systems Federate sensor data - integrate across sensor streams Workflow systems Analyze data - integrate client- & server-side workflows

9 Generic Infrastructure Data grids organize distributed data into shared collections Persistent name spaces for files, users, storage Collection attributes Provenance, descriptive, system metadata Data grids manage heterogeneous storage systems Standard operations across file systems, tape archives, object ring buffers Enable technology evolution At the point in time when new technology is available, both the old and new systems can be integrated

10 Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

11 Using a Data Grid - Details iRODS Server Data request goes to iRODS Server iRODS Server Metadata Catalog DB Server looks up information in catalog Catalog tells which iRODS server has data 1 st server asks 2 nd for data The 2nd iRODS server applies rules User asks for data

12 Extremely Successful Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, … Astronomy Data grid Bio-informaticsDigital library Earth SciencesData grid EcologyCollection EducationPersistent archive EngineeringDigital library Environmental science Data grid High energy physicsData grid HumanitiesData Grid Medical communityDigital library OceanographyReal time sensor data, persistent archive SeismologyDigital library, real-time sensor data Goal has been generic infrastructure for distributed data


14 BaBar High-Energy Physics Stanford Linear Accelerator IN2P3 Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day

15 Requirements Driving Evolution Observe that as the size of the shared collections grow, the administrative tasks can become onerous. Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment Need to minimize labor support through automation of administrative functions File ingestion tasks Verification of desired collection properties Integrity checks and replica management

16 Requirements Driving Evolution Observe that each community has unique management policies User administration File retention & deletion Time-dependent access controls Data distribution and replication File update (versions, backups) Descriptive metadata

17 Requirements Driving Evolution Socialization of collections The creators of the collection have specific properties that they assert the collection will possess Completeness Authoritative sources Authenticity The users of the collection have their own criteria for the properties they expect Socialization is the mapping from creator assertions to user expectations

18 Data Grid Mechanisms Essential components needed for synergism implemented in SRB Infrastructure independence Data and trust virtualization Components needed for specific management policies and processes implemented in iRODS Map policies to rules that control all processes Map processes to standard micro-services

19 Data Management iRODS - integrated Rule-Oriented Data System

20 Rules Rule classes System enforced rules Administrator controlled rules User defined rules Rule execution Atomic rules - executed on each operation invoked by a client Deferred rules - executed at a future time Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity)

21 iRODS Rule Syntax Event | Condition | Action-set | Recovery-set Event - triggered by operation or queued rule Condition- composed of tests on any attributes in the persistent state information Action-set - composed from both micro-services and rules Recovery-set - used to ensure transaction semantics and consistent state information Executed by a rule engine installed at each storage location - server side workflows

22 Micro-Services Challenge is that storage systems do not provide desired processes Have minimal set of standard operations that are performed at the storage system Have actions required by clients such as replication, metadata extraction Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes.

23 Data Virtualization Storage System Storage Protocol Access Interface Standard Micro-services Data Grid Map from the actions requested by the access method to a standard set of micro- services. The standard micro- services are mapped to the operations supported by the storage system Standard Operations

24 integrated Rule-Oriented Data System Client InterfaceAdmin Interface Current State Rule Invoker Micro Service Modules Metadata-based Services Resources Micro Service Modules Resource-based Services Service Manager Consistency Check Module Rule Modifier Module Consistency Check Module Engine Rule Confs Config Modifier Module Metadata Modifier Module Metadata Persistent Repository Consistency Check Module Rule Base

25 Distributed Management System RuleEngine DataTransport MetadataCatalog ExecutionControl MessagingSystem ExecutionEngine Virtualization ServerSideWorkflow PersistentStateinformation Scheduling PolicyManagement

26 Micro-service Classes Test System Workflow control Client iCAT catalog User level invoked by irule Image manipulation

27 Digital Preservation Preservation community is defining the rules need to assert trustworthiness of a digital repository RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist. pub/Main/ReferenceInputDocuments/trac.pdf Defined 105 rules that are being implemented in iRODS

28 RLG/NARA Assessment Example TRAC assessment criteria 90Verify descriptive metadata and source against SIP template and set SIP compliance flag 91Verify descriptive metadata against semantic term list 92Verify status of metadata catalog backup (create a snapshot of metadata catalog) 93Verify consistency of preservation metadata after hardware change or error

29 Classes of Assessment Criteria Collection properties List properties of associated name spaces Verify properties Compare properties with assertions Collection operations Transform file formats Migrate data Generate audit trails Structured information Parse audit trails to generate compliance reports Apply templates to extract information Apply templates to format state information

30 iRODS Development NSF - SDCI grant Adaptive Middleware for Community Shared Collections iRODS development, SRB maintenance NARA - Transcontinental Persistent Archive Prototype Trusted repository assessment criteria NSF - Ocean Research Interactive Observatory Network (ORION) Real-time sensor data stream management NSF - Temporal Dynamics of Learning Center data grid Management of Institution Research Board approval

31 iRODS Development Status Current release is version June 2007 Production release will be version 1.0 Fall quarter 2007 International collaborations SHAMAN - University of Liverpool Sustaining Heritage Access through Multivalent ArchiviNg UK e-Science data grid IN2P3 in Lyon, France DSpace policy management

32 Planned Development GSI support Time-limited sessions via a one-way hash authentication Python Client library GUI Browser (AJAX in development) Driver for HPSS (in development) Driver for SAM-QFS Porting to additional versions of Unix/Linux Porting to Windows Support for MySQL as the metadata catalog API support packages based on existing mounted collection driver MCAT to ICAT migration tools Extensible Metadata including Databases Access Interface Zones/Federation Auditing - mechanisms to record and track iRODS persistent state changes

33 For More Information (iRODS Tutorial on Thursday) Reagan W. Moore San Diego Supercomputer Center

Download ppt "Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center"

Similar presentations

Ads by Google