Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,

Slides:



Advertisements
Similar presentations
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Data Grids for Collection Federation Reagan W. Moore University.
Advertisements

GFS OGF-22 Global Resource Naming Developers: Reagan Moore Arcot Mike.
OGF-23 iRODS Metadata Grid File System Reagan Moore San Diego Supercomputer Center.
© 2006 Open Grid Forum OGF19 Federated Identity Rule-based data management Wed 11:00 AM Mountain Laurel Thurs 11:00 AM Bellflower.
Data Management Systems Richard Marciano Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
The Storage Resource Broker and.
The Storage Resource Broker and.
Digital Preservation Lifecycle Management Building a demonstration prototype for the preservation of large-scale multi-media collections Arcot Rajasekar.
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE SAN DIEGO SUPERCOMPUTER CENTER Particle Physics Data Grid PPDG Data Handling System Reagan.
San Diego Supercomputer Center NARA Research Prototype Persistent Archive Building Preservation Environments with Data Grid Technology (NARA Research Prototype.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Grid Based Solutions for Distributed Data Management Reagan.
1 Applied CyberInfrastructure Concepts ISTA 420/520 Fall Nirav Merchant Bio Computing & iPlant Collaborative Eric Lyons.
A Very Brief Introduction to iRODS
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
Security Requirements for Shared Collections Storage Resource Broker Reagan W. Moore
GGF-17 Astro Workshop Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan Moore (SDSC) Goals  Demonstrate.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
PAWN: Producer-Archive Workflow Network University of Maryland Institute for Advanced Computer Studies Joseph Ja’Ja, Mike Smorul, Mike McGann.
Applying Data Grids to Support Distributed Data Management Storage Resource Broker Reagan W. Moore Ian Fisk Bing Zhu University of California, San Diego.
Brief Overview of Major Enhancements to PAWN. Producer – Archive Workflow Network (PAWN) Distributed and secure ingestion of digital objects into the.
Modern Data Management Overview Storage Resource Broker Reagan W. Moore
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
DCC Conference, Glasgow November, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego.
National Partnership for Advanced Computational Infrastructure Digital Library Architecture Reagan Moore Chaitan Baru Amarnath Gupta George Kremenek Bertram.
San Diego Supercomputer CenterUniversity of California, San Diego Preservation Research Roadmap Reagan W. Moore San Diego Supercomputer Center
Information Management and Distributed Data Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
Working Group: Practical Policy Rainer Stotzka, Reagan Moore.
Rule-Based Distributed Data Management Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center
Jan Storage Resource Broker Managing Distributed Data in a Grid A discussion of a paper published by a group of researchers at the San Diego Supercomputer.
Data Grids and Data Management Storage Resource Broker Reagan W. Moore
Managing Simulation Output Storage Resource Broker Reagan W. Moore
PERG OGF-22 Preservation Environments Research Group Organizers: Reagan Moore Richard Marciano
Rule-Based Distributed Data Management iRODS Jan 23, Reagan W. Moore Mike Wan Arcot Rajasekar Wayne Schroeder San Diego.
1 integrated Rule Oriented Data System Tutorial: iRODS Capabilities.
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of.
Richard MarcianoChien-Yi Hou Caryn Wojcik University of University of State of Michigan North Carolina North Carolina Records Management ServicesSALT DCAPE.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Working Group Practical Policy based on slides and latest documents from the PP WG chaired by Reagan Moore, Rainer Stotzka presented by Johannes Reetz.
Rule-Based Preservation Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar Richard Marciano {moore, schroede, mwan, sekar,
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Management of Distributed Data Reagan W. Moore.
National Partnership for Advanced Computational Infrastructure San Diego Supercomputer Center Persistent Archive for the NSDL Reagan W. Moore Charlie Cowart.
SRB 1 & iRODS 2 Arcot Rajasekar Reagan Moore Mike Wan SDSC/UCSD Pathways to OOI-CI CyberData Architecture 1 Storage Resource Broker 2 integrated Rule Oriented.
Interoperability of Digital Repositories Adil Hasan Univ of Liverpool.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Data Management Planning Session Kevin Gomes Michael Meisinger Arcot Rajasekar Michael Wan October 19, 2007.
San Diego Supercomputer CenterNational Partnership for Advanced Computational Infrastructure1 Data Grids, Digital Libraries, and Persistent Archives Reagan.
From SRB to IRODS: Policy Virtualization using Rule-Based Data Grids Reagan W. Moore Wayne Schroeder Arcot Rajasekar Mike Wan San Diego Supercomputer Center.
GGF-17 Preservation Environments Research Group Preservation Environment Working Group Officers: Bruce Barkstrom (NASA Langley) Reagan.
Introduction to The Storage Resource.
National Science Foundation Cooperative Agreement: OCI Reagan Moore, PI Mary Whitton, Project Manager.
©MIT LKTR Workshop, Digital Archive Policies and Trusted Digital Repositories MacKenzie Smith, MIT Libraries Reagan Moore, San Diego Supercomputer.
National Archives and Records Administration1 Integrated Rules Ordered Data System (“IRODS”) Technology Research: Digital Preservation Technology in a.
Rights Management for Shared Collections Storage Resource Broker Reagan W. Moore
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Building Preservation Environments Reagan W. Moore San Diego Supercomputer Center Storage Resource Broker.
Use of Policies to Enforce Collection Properties Richard Marciano Reagan Moore University of North Chapel Hill Data Intensive Cyber Environments.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
Working Group: Data Foundations and Terminology (Practical Policy Considerations) Reagan Moore.
Building Preservation Environments from Federated Data Grids Reagan W. Moore San Diego Supercomputer Center Storage.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
An Overview of iRODS Integrated Rule-Oriented Data System
Collection Based Persistent Archives
Policy-Based Data Management integrated Rule Oriented Data System
Arcot Rajasekar Michael Wan Reagan Moore (sekar, mwan,
Technical Issues in Sustainability
Presentation transcript:

Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,

Topics Managing distributed shared collections Data grids Control of name spaces - SRB Production system Data and trust virtualization Infrastructure independence Control of management policies - iRODS Next generation technology Management virtualization Rules controlling remote operations Constraints on the rules and remote operations

Data Management Applications Data grids Share data Digital libraries Publish data Persistent archives Preserve data Real-time sensor streams Data federation Data analysis Automate access to distributed data

Concepts Distributed Data Management Concepts Data virtualization Manage the properties of a shared collection independently of the storage systems Trust virtualization Administrative domain independence Federation Managing interactions between data grids Rule-based Data Management Policy virtualization Automating execution of management policies Applying management policies to remote operations

Data Grid Using a Data Grid – in Abstract Ask for data User asks for data from the data grid Data delivered The data is found and returned Where & how details are hidden

Using a Data Grid - Details Storage Resource Broker Server Data request goes to SRB Server Storage Resource Broker Server Metadata Catalog DB Server looks up information in catalog Catalog tells which SRB server has data 1 st server asks 2 nd for data The data is found and returned User asks for data

Data Virtualization Manage properties of each digital entity independently of the remote storage systems Infrastructure independence Properties of the shared collection Name spaces Persistent state information (location, size,…) Manage standard operations Map from client requests to standard operations Map from standard operations to remote storage system protocol

Data Virtualization Storage Repository Storage location User name File name File context (creation date,…) Access controls Data Grid Logical resource name space Logical user name space Logical file name space Logical context (metadata) Access constraints Data Collection Data Access Methods (C library, Unix, Web Browser) Data is organized as a shared collection

Data Virtualization Storage System Storage Protocol Access Interface Standard Access Actions Data Grid Map from the actions requested by the access method to a standard set of micro-services used to interact with the storage system Standard Micro-services

Standard Operations File manipulation Posix I/O calls - open, close, read, write, seek, … Register, replicate, checksum, synchronize Bulk operations Bulk data transport, metadata load Parallel I/O streams Remote procedures Data filtering, subsetting, metadata extraction Remote library execution (HDFv5, DataCutter)

BaBar High-Energy Physics Stanford Linear Accelerator IN2P3 Lyon, France Rome, Italy San Diego RAL, UK A functioning international Data Grid for high-energy physics Manchester-SDSC mirror Moved over 300 TBs of data Increasing to 5 TBs per day

Next Generation Technology Every fault that occurs in the distributed environment is the responsibility of the data grid Network outage / system crash / operator error Minimize risk through checksums, replicas, synchronization, federation Management of large collections is labor intensive Initiation of recovery operations after remote system failure Need to automate execution of management policies

Controlling Remote Operations iRODS - integrated Rule-Oriented Data System Support unique organizational / social management policies for each collection

Rule-based Data Management Express assessment criteria through sets of required persistent state information Express management policies as sets of rules controlling the execution of micro- services Express capabilities as sets of micro- services Manage persistent state information resulting from the application of rules controlling execution of remote micro-services

Management Virtualization Examples of management policies Integrity Validation of checksums Synchronization of replicas Data distribution Data retention Access controls Authenticity Chain of custody - audit trails Track required preservation metadata - templates Generation of Archival Information Packages

Rule-based Data Management Rules required for standard operations Posix I/O control Standard SRB operations Administrator controlled rules to implement management policies Administrative - adding / deleting users, resources Data ingestion - pre-processing, post-processing Data transport / deletion - parallel I/O streams, disposition User-defined rules, create your own server-side workflow Rule set for a particular collection, particular user group, particular storage system, particular micro-service

iRODS Rule Each rule defines Event Condition Action sets (micro-services and rules) Recovery sets Rule types Atomic, applied immediately Deferred, support deferred consistent constraints Periodic, typically used to validate assertions

Rule-based Access Associate security policies with each digital entity Redaction, access controls on structures within a file Time-dependent access controls (how long to hold data proprietary) Associate access controls with each rule Restrict ability to modify, apply rules Associate access controls with each micro- service Explicit control of operation execution within a given collection Much finer control than provided by Unix r:w:e

Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection A

Rule-based Federation When registering a digital entity into another data grid, register required management rules along with the digital entity Move management policies with data Expectation that each operation on each digital entity can be controlled across federated data grids Example is end-to-end encryption

Evolution of Rule-based Systems Logical name spaces enable dynamic addition of new rules, micro-services, and state information Apply new rules on one collection while applying old rule sets on a legacy collection Can run old and new rule sets in parallel Can build a system that manages its evolution Can create rules that track the evolution of the rule- based system Can create rules that govern migration to new rule sets

Assessment Rules Can build a system that monitors its own state information Parse audit trails to verify accesses by authorized persons Parse persistent state information for compliance with management rules Test micro-services for compliance with rules Audit all accesses to a collection Compare system properties to desired outcomes

For More Information Reagan W. Moore San Diego Supercomputer Center SRB: iRODS: