Presentation is loading. Please wait.

Presentation is loading. Please wait.

From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.

Similar presentations


Presentation on theme: "From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center."— Presentation transcript:

1 From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center moore@sdsc.edu http://irods.sdsc.edu http://www.sdsc.edu/srb/

2 Data Grid Evolution Data grids Infrastructure independence Data sharing through data and trust virtualization SRB - Storage Resource Broker Rule-based data grids Automation of management policies Management virtualization Open source software iRODS - integrated Rule-Oriented Data System

3 Data Management Applications Data grids Share data - organize distributed data as a collection Digital libraries Publish data - support browsing and discovery Persistent archives Preserve data - manage technology evolution Real-time sensor systems Federate sensor data - integrate across sensor streams Workflow systems Analyze data - integrate client- & server-side workflows

4 Generic Infrastructure Data grids organize distributed data into shared collections Persistent name spaces for files, users, storage Collection attributes Provenance, descriptive, system metadata Data grids manage heterogeneous storage systems Standard operations across file systems, tape archives, object ring buffers Enable technology evolution At the point in time when new technology is available, both the old and new systems can be integrated

5 Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

6 Using a Data Grid - Details iRODS Server Data request goes to iRODS Server iRODS Server Metadata Catalog DB Server looks up information in catalog Catalog tells which iRODS server has data 1 st server asks 2 nd for data The 2nd iRODS server applies rules User asks for data

7 Extremely Successful Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, … Astronomy Data grid Bio-informaticsDigital library Earth SciencesData grid EcologyCollection EducationPersistent archive EngineeringDigital library Environmental science Data grid High energy physicsData grid HumanitiesData Grid Medical communityDigital library OceanographyReal time sensor data, persistent archive SeismologyDigital library, real-time sensor data Goal has been generic infrastructure for distributed data

8

9 BaBar High-Energy Physics Stanford Linear Accelerator IN2P3 Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day

10 Requirements Driving Evolution Observe that as the size of the shared collections grow, the administrative tasks can become onerous. Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment Need to minimize labor support through automation of administrative functions File ingestion tasks Verification of desired collection properties Integrity checks and replica management

11 Requirements Driving Evolution Observe that each community has unique management policies User administration File retention & deletion Time-dependent access controls Data distribution and replication File update (versions, backups) Descriptive metadata

12 Requirements Driving Evolution Socialization of collections The creators of the collection have specific properties that they assert the collection will possess Completeness Authoritative sources Authenticity The users of the collection have their own criteria for the properties they expect Socialization is the mapping from creator assertions to user expectations

13 Data Grid Mechanisms Essential components needed for synergism implemented in SRB Infrastructure independence Data and trust virtualization Components needed for specific management policies and processes implemented in iRODS Map policies to rules that control all processes Map processes to standard micro-services

14 Data Management iRODS - integrated Rule-Oriented Data System

15 Rules Rule classes System enforced rules Administrator controlled rules User defined rules Rule execution Atomic rules - executed on each operation invoked by a client Deferred rules - executed at a future time Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity)

16 iRODS Rule Syntax Event | Condition | Action-set | Recovery-set Event - triggered by operation or queued rule Condition- composed of tests on any attributes in the persistent state information Action-set - composed from both micro-services and rules Recovery-set - used to ensure transaction semantics and consistent state information Executed by a rule engine installed at each storage location - server side workflows

17 Micro-Services Challenge is that storage systems do not provide desired processes Have “minimal” set of standard operations that are performed at the storage system Have actions required by clients such as replication, metadata extraction Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes.

18 Data Virtualization Storage System Storage Protocol Access Interface Standard Micro-services Data Grid Map from the actions requested by the access method to a standard set of micro- services. The standard micro- services are mapped to the operations supported by the storage system Standard Operations

19 integrated Rule-Oriented Data System Client InterfaceAdmin Interface Current State Rule Invoker Micro Service Modules Metadata-based Services Resources Micro Service Modules Resource-based Services Service Manager Consistency Check Module Rule Modifier Module Consistency Check Module Engine Rule Confs Config Modifier Module Metadata Modifier Module Metadata Persistent Repository Consistency Check Module Rule Base

20 Distributed Management System RuleEngine DataTransport MetadataCatalog ExecutionControl MessagingSystem ExecutionEngine Virtualization ServerSideWorkflow PersistentStateinformation Scheduling PolicyManagement

21 Micro-service Classes Test System Workflow control Client iCAT catalog User level invoked by “irule” Image manipulation

22 Digital Preservation Preservation community is defining the rules need to assert trustworthiness of a digital repository RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist. http://wiki.digitalrepositoryauditandcertification.org/ pub/Main/ReferenceInputDocuments/trac.pdf Defined 105 rules that are being implemented in iRODS

23 RLG/NARA Assessment Example TRAC assessment criteria 90Verify descriptive metadata and source against SIP template and set SIP compliance flag 91Verify descriptive metadata against semantic term list 92Verify status of metadata catalog backup (create a snapshot of metadata catalog) 93Verify consistency of preservation metadata after hardware change or error

24 Classes of Assessment Criteria Collection properties List properties of associated name spaces Verify properties Compare properties with assertions Collection operations Transform file formats Migrate data Generate audit trails Structured information Parse audit trails to generate compliance reports Apply templates to extract information Apply templates to format state information

25 iRODS Development NSF - SDCI grant “Adaptive Middleware for Community Shared Collections” iRODS development, SRB maintenance NARA - Transcontinental Persistent Archive Prototype Trusted repository assessment criteria NSF - Ocean Research Interactive Observatory Network (ORION) Real-time sensor data stream management NSF - Temporal Dynamics of Learning Center data grid Management of Institution Research Board approval

26 iRODS Development Status Current release is version 0.9.2 June 2007 Production release will be version 1.0 Fall quarter 2007 International collaborations SHAMAN - University of Liverpool Sustaining Heritage Access through Multivalent ArchiviNg UK e-Science data grid IN2P3 in Lyon, France DSpace policy management

27 Planned Development GSI support Time-limited sessions via a one-way hash authentication Python Client library GUI Browser (AJAX in development) Driver for HPSS (in development) Driver for SAM-QFS Porting to additional versions of Unix/Linux Porting to Windows Support for MySQL as the metadata catalog API support packages based on existing mounted collection driver MCAT to ICAT migration tools Extensible Metadata including Databases Access Interface Zones/Federation Auditing - mechanisms to record and track iRODS persistent state changes

28 For More Information (iRODS Tutorial on Thursday) Reagan W. Moore San Diego Supercomputer Center moore@sdsc.edu http://www.sdsc.edu/srb/ http://irods.sdsc.edu/


Download ppt "From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center."

Similar presentations


Ads by Google