From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
The Storage Resource Broker and.
The Storage Resource Broker and.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center, University of California at San Diego Grid Physics Network (GriPhyN) University of Florida A Data Storage Language for.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
Integrated Rule Oriented Data System (iRODS) Reagan W. Moore Arcot Rajasekar Mike Wan
A Very Brief Introduction to iRODS
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
iRODS: Interoperability in Data Management
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
ADAPT An Approach to Digital Archiving and Preservation Technology Principal Investigator: Joseph JaJa Lead Programmers: Mike Smorul and Mike McGann Graduate.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Robust Tools for Archiving and Preserving Digital Data Joseph JaJa, Mike Smorul, and Mike McGann Institute for Advanced Computer Studies Department of.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Richard MARCIANO Chien-Yi HOU School of Information and Library Science (SILS) Sustainable Archives & Leveraging Technologies Group (SALT) University of.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Rule-Based Distributed Data Management Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
USING METADATA TO FACILITATE UNDERSTANDING AND CERTIFICATION ABOUT THE PRESERVATION PROPERTIES OF A PRESERVATION SYSTEM Jewel H. Ward, Hao Xu, Mike C.
Managing Simulation Output Storage Resource Broker Reagan W. Moore
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Rule-Based Programming for VORBs Bertram Ludaescher Arcot Rajasekar Data and Knowledge Systems San Diego Supercomputer Center U.C. San Diego.
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data Management Planning Session Kevin Gomes Michael Meisinger Arcot Rajasekar Michael Wan October 19, 2007.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
Introduction to The Storage Resource.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
Preserving Electronic Mailing Lists as Scholarly Resources: The H-Net Archives Lisa M. Schmidt
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
Biomedical Informatics Research Network The Storage Resource Broker & Integration with NMI Middleware Arcot Rajasekar, BIRN-CC SDSC October 9th 2002 BIRN.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
Introduction to iRODS Jean-Yves Nief.
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Presentation transcript:

From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center

Data Grid Evolution Data grids Infrastructure independence Data sharing through data and trust virtualization SRB - Storage Resource Broker Rule-based data grids Automation of management policies Management virtualization Open source software iRODS - integrated Rule-Oriented Data System

Data Management Applications Data grids Share data - organize distributed data as a collection Digital libraries Publish data - support browsing and discovery Persistent archives Preserve data - manage technology evolution Real-time sensor systems Federate sensor data - integrate across sensor streams Workflow systems Analyze data - integrate client- & server-side workflows

Generic Infrastructure Data grids organize distributed data into shared collections Persistent name spaces for files, users, storage Collection attributes Provenance, descriptive, system metadata Data grids manage heterogeneous storage systems Standard operations across file systems, tape archives, object ring buffers Enable technology evolution At the point in time when new technology is available, both the old and new systems can be integrated

Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

Using a Data Grid - Details iRODS Server Data request goes to iRODS Server iRODS Server Metadata Catalog DB Server looks up information in catalog Catalog tells which iRODS server has data 1 st server asks 2 nd for data The 2nd iRODS server applies rules User asks for data

Extremely Successful Storage Resource Broker (SRB) manages 2 PBs of data in internationally shared collections Data collections for NSF, NARA, NASA, DOE, DOD, NIH, LC, NHPRC, IMLS; APAC, UK e-Science, IN2P3, KEK, … Astronomy Data grid Bio-informaticsDigital library Earth SciencesData grid EcologyCollection EducationPersistent archive EngineeringDigital library Environmental science Data grid High energy physicsData grid HumanitiesData Grid Medical communityDigital library OceanographyReal time sensor data, persistent archive SeismologyDigital library, real-time sensor data Goal has been generic infrastructure for distributed data

BaBar High-Energy Physics Stanford Linear Accelerator IN2P3 Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day

Requirements Driving Evolution Observe that as the size of the shared collections grow, the administrative tasks can become onerous. Data grids provide mechanisms to manage recovery from all errors that occur in the distributed environment Need to minimize labor support through automation of administrative functions File ingestion tasks Verification of desired collection properties Integrity checks and replica management

Requirements Driving Evolution Observe that each community has unique management policies User administration File retention & deletion Time-dependent access controls Data distribution and replication File update (versions, backups) Descriptive metadata

Requirements Driving Evolution Socialization of collections The creators of the collection have specific properties that they assert the collection will possess Completeness Authoritative sources Authenticity The users of the collection have their own criteria for the properties they expect Socialization is the mapping from creator assertions to user expectations

Data Grid Mechanisms Essential components needed for synergism implemented in SRB Infrastructure independence Data and trust virtualization Components needed for specific management policies and processes implemented in iRODS Map policies to rules that control all processes Map processes to standard micro-services

Data Management iRODS - integrated Rule-Oriented Data System

Rules Rule classes System enforced rules Administrator controlled rules User defined rules Rule execution Atomic rules - executed on each operation invoked by a client Deferred rules - executed at a future time Periodic rules - executed to validate assessment criteria and enforce desired properties (integrity)

iRODS Rule Syntax Event | Condition | Action-set | Recovery-set Event - triggered by operation or queued rule Condition- composed of tests on any attributes in the persistent state information Action-set - composed from both micro-services and rules Recovery-set - used to ensure transaction semantics and consistent state information Executed by a rule engine installed at each storage location - server side workflows

Micro-Services Challenge is that storage systems do not provide desired processes Have “minimal” set of standard operations that are performed at the storage system Have actions required by clients such as replication, metadata extraction Create standard micro-services that aggregate storage operations into modules that can be used to implement desired processes.

Data Virtualization Storage System Storage Protocol Access Interface Standard Micro-services Data Grid Map from the actions requested by the access method to a standard set of micro- services. The standard micro- services are mapped to the operations supported by the storage system Standard Operations

integrated Rule-Oriented Data System Client InterfaceAdmin Interface Current State Rule Invoker Micro Service Modules Metadata-based Services Resources Micro Service Modules Resource-based Services Service Manager Consistency Check Module Rule Modifier Module Consistency Check Module Engine Rule Confs Config Modifier Module Metadata Modifier Module Metadata Persistent Repository Consistency Check Module Rule Base

Distributed Management System RuleEngine DataTransport MetadataCatalog ExecutionControl MessagingSystem ExecutionEngine Virtualization ServerSideWorkflow PersistentStateinformation Scheduling PolicyManagement

Micro-service Classes Test System Workflow control Client iCAT catalog User level invoked by “irule” Image manipulation

Digital Preservation Preservation community is defining the rules need to assert trustworthiness of a digital repository RLG/NARA - Trustworthy Repositories Audit & Certification: Criteria and Checklist. pub/Main/ReferenceInputDocuments/trac.pdf Defined 105 rules that are being implemented in iRODS

RLG/NARA Assessment Example TRAC assessment criteria 90Verify descriptive metadata and source against SIP template and set SIP compliance flag 91Verify descriptive metadata against semantic term list 92Verify status of metadata catalog backup (create a snapshot of metadata catalog) 93Verify consistency of preservation metadata after hardware change or error

Classes of Assessment Criteria Collection properties List properties of associated name spaces Verify properties Compare properties with assertions Collection operations Transform file formats Migrate data Generate audit trails Structured information Parse audit trails to generate compliance reports Apply templates to extract information Apply templates to format state information

iRODS Development NSF - SDCI grant “Adaptive Middleware for Community Shared Collections” iRODS development, SRB maintenance NARA - Transcontinental Persistent Archive Prototype Trusted repository assessment criteria NSF - Ocean Research Interactive Observatory Network (ORION) Real-time sensor data stream management NSF - Temporal Dynamics of Learning Center data grid Management of Institution Research Board approval

iRODS Development Status Current release is version June 2007 Production release will be version 1.0 Fall quarter 2007 International collaborations SHAMAN - University of Liverpool Sustaining Heritage Access through Multivalent ArchiviNg UK e-Science data grid IN2P3 in Lyon, France DSpace policy management

Planned Development GSI support Time-limited sessions via a one-way hash authentication Python Client library GUI Browser (AJAX in development) Driver for HPSS (in development) Driver for SAM-QFS Porting to additional versions of Unix/Linux Porting to Windows Support for MySQL as the metadata catalog API support packages based on existing mounted collection driver MCAT to ICAT migration tools Extensible Metadata including Databases Access Interface Zones/Federation Auditing - mechanisms to record and track iRODS persistent state changes

For More Information (iRODS Tutorial on Thursday) Reagan W. Moore San Diego Supercomputer Center