Presentation is loading. Please wait.

Presentation is loading. Please wait.

Policy-Based Data Management integrated Rule Oriented Data System

Similar presentations


Presentation on theme: "Policy-Based Data Management integrated Rule Oriented Data System"— Presentation transcript:

1 Policy-Based Data Management integrated Rule Oriented Data System
Reagan Moore Arcot Rajasekar Mike Wan

2 Preservation is a Stage in the Data Life Cycle
Each data life cycle stage re-purposes the original collection Project Collection Private Local Policy Data Grid Shared Distribution Policy Data Processing Pipeline Analyzed Service Policy Digital Library Published Description Policy Reference Collection Preserved Representation Policy Federation Sustained Re-purposing Policy Stages correspond to addition of new policies for a broader community Virtualize the stages of the data life cycle through policy evolution Interoperability across data life cycle representations

3 Policy-based Preservation Environment
Purpose - reason a preservation environment is assembled Properties - attributes needed to ensure the purpose Policies - control for ensuring maintenance of properties Procedures - functions that implement the policies State information - results of applying the procedures Assessment criteria - validation that state information conforms to the desired purpose Federation - controlled sharing of logical name spaces These are the necessary elements for a preservation environment 3

4 iRODS - Policy-based Data Management
Turn policies into computer actionable rules Compose rules by chaining standard operations Standard operations (micro-services) executed at the remote storage location Manage state information as attributes on namespaces: Files / collections /users / resources / rules Validate assessment criteria Queries on state information, parsing of audit trails Automate administrative functions Minimize labor costs

5 Policy-based Preservation - Authenticity
Purpose - Maintain authenticity of records Properties - Define template for required representation information Policies - Extract and register representation information for each file on ingestion Procedures - Parse record / XML file to extract metadata State information - Register representation information into metadata catalog Assessment criteria - Compare registered metadata with template defining required values A preservation environment should automate each of these steps 5 5

6 Assessment Criteria NARA Electronic Records Archive capabilities list
853 defined capabilities Mapped to 174 computer actionable rules Mapped to 212 state information attributes RLG/NARA Trusted Repository Audit Checklist Mapped to 105 computer actionable rules Included 66 rules specific to preservation ISO Mission Operations Information Management System repository audit checklist 106 policies for operation and control Mapped to 52 computer actionable rules

7 Examples of Assessment Criteria
Specify a template that governs the representation information required for a specific record series content of a Submission Information Package (SIP) content of an Archival Information Package (AIP) number of replicas Verify compliance of SIP with specification compliance of AIP with specification compliance with required replica number integrity of the replicas

8 Preservation Communities
NARA Transcontinental Persistent Archive Prototype Develop policies to automate preservation of selected digital holdings National Optical Astronomy Observatory Accession images from a telescope in Chile Carolina Digital Repository Preserve institutional collections

9 Federation of Seven Independent Data Grids
National Archives and Records Administration Transcontinental Persistent Archive Prototype Federation of Seven Independent Data Grids NARA II MCAT Georgia Tech MCAT Rocket Center MCAT NARA I U NC U Md UCSD MCAT MCAT MCAT MCAT Extensible Environment, can federate with additional research and education sites. Each data grid can use different vendor products. Policy to coalesce authentic records from independent data grids. Choose whether write to central archive, or use soft links.

10 NOAO Zone Architecture
Telescope Telescope Archive

11 Carolina Digital Repository
Architecture: Web interface Fedora digital library middleware iRODS data grid Supports: Registration of file into iRODS Generation of FOXM Registration into Fedor Query through Fedor Synchronization of catalogs From Conceptualizing Policy-Driven Repository Interoperability (PoDRI) Using iRODS and Fedora (Pcolar, Davis, Zhu, Chassanoff, Hou, Marciano)

12 Preservation Concepts
Preservation environments are inherently distributed and federated Mitigate risk of data loss Mitigate dependence on a single vendor Mitigate dependence on a single institution Management of technology evolution can be done through same mechanisms that support interoperability across heterogeneous storage systems At the point in time when add new technology, both the old and new technologies are present Migrate from old protocols to new protocols using data grids

13 Preservation Concepts (Cont.)
Preservation requires management of communication with the future Need to migrate records to future technology Need procedural infrastructure independence to ensure can parse data formats in the future Preservation requires management of communication from the past Need to know what policies and procedures were applied by prior archivists Need to validate that policies were enforced Federation minimizes risk of data loss Deep archive implemented through rules that: turn on data staging, data versioning, replication turn off deletion, external write, external data grid access

14 Preservation Concepts (Cont.)
Periodic verification of assessment criteria Check that required properties still hold These rules are in addition to the rules that enforce policies Compare values in metadata catalog with expected values Number of replicas, checksums of files, required metadata Verify relationships between files in storage and entries in metadata catalog Metadata record <----> files in storage Parse audit trails to track compliance over time Evaluate impact of changing preservation policy

15 Overview of iRODS Architecture
User Can Search, Access, Add and Manage Data & Metadata iRODS Data System iRODS Metadata Catalog Track information iRODS Rule Engine Track policies iRODS Data Server Disk, Tape, etc. *Access data with Web-based Browser or iRODS GUI or Command Line clients.

16 Managing Properties of Records
Namespaces Record (file name) Users Storage resources Rules State information User-defined metadata (provenance) System attributes Procedures Basic operations performed on data Store, retrieve, move, copy, replicate, parse, aggregate Extract metadata, checksum, synchronize, version

17 Migration of Procedures
Map from actions requested by the access method to a standard set of Micro-services. Map the standard Micro-services to standard operations. Map the operations to protocol supported by the operating system. Access Interface Standard Micro-services Data Grid Standard Operations Storage Protocol Storage System

18 Format of an iRODS Rule Action | Condition | MS1, …, MSn | RMS1, …, RMSn Action Name of action to be performed Name known to the server and invoked by server Condition – condition under which the rule applies Micro-services - Chain of micro-services to be executed Recovery micro-service - If any micro service fails, recovery micro-service(s) executed to maintain transactional consistency Example of MS/RMS createFile(*F) removeFile(*F) ingestMetadata(*F,*M) rollback

19 iRODS - Distributed Operating System

20 iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2009 Budget Supplement in the area of Human and Computer Interaction Information Management technology research. Reagan W. Moore NSF OCI “NARA Transcontinental Persistent Archives Prototype” NSF SDCI “Data Grids for Community Driven Applications”


Download ppt "Policy-Based Data Management integrated Rule Oriented Data System"

Similar presentations


Ads by Google