Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2006 Open Grid Forum Preservation Enviroment Research Group Rule-based preservation.

Similar presentations


Presentation on theme: "© 2006 Open Grid Forum Preservation Enviroment Research Group Rule-based preservation."— Presentation transcript:

1 © 2006 Open Grid Forum Preservation Enviroment Research Group Rule-based preservation

2 © 2006 Open Grid Forum 2 OGF IPR Policies Apply I acknowledge that participation in this meeting is subject to the OGF Intellectual Property Policy. Intellectual Property Notices Note Well: All statements related to the activities of the OGF and addressed to the OGF are subject to all provisions of Appendix B of GFD-C.1, which grants to the OGF and its participants certain licenses and rights in such statements. Such statements include verbal statements in OGF meetings, as well as written and electronic communications made at any time or place, which are addressed to: the OGF plenary session, any OGF working group or portion thereof, the OGF Board of Directors, the GFSG, or any member thereof on behalf of the OGF, the ADCOM, or any member thereof on behalf of the ADCOM, any OGF mailing list, including any group list, or any other list functioning under OGF auspices, the OGF Editor or the document authoring and review process Statements made outside of a OGF meeting, mailing list or other function, that are clearly not intended to be input to an OGF activity, group or function, are not subject to these provisions. Excerpt from Appendix B of GFD-C.1: Where the OGF knows of rights, or claimed rights, the OGF secretariat shall attempt to obtain from the claimant of such rights, a written assurance that upon approval by the GFSG of the relevant OGF document(s), any party will be able to obtain the right to implement, use and distribute the technology or works when implementing, using or distributing technology based upon the specific specification(s) under openly specified, reasonable, non- discriminatory terms. The working group or research group proposing the use of the technology with respect to which the proprietary rights are claimed may assist the OGF secretariat in this effort. The results of this procedure shall not affect advancement of document, except that the GFSG may defer approval where a delay may facilitate the obtaining of such assurances. The results will, however, be recorded by the OGF Secretariat, and made available. The GFSG may also direct that a summary of the results be included in any GFD published containing the specification. OGF Intellectual Property Policies are adapted from the IETF Intellectual Property Policies that support the Internet Standards Process.

3 © 2006 Open Grid Forum 3 OGF20 Preservation Environments Research Group Organizers: Reagan Moore (moore@sdsc.edu)moore@sdsc.edu "Bruce.Barkstrom" Goals: Present archives based on data grid technology INCIPIT virtual vellum Analyze capabilities required by a preservation environment Define rule-based preservation environment - iRODS NARA Electronic Records Archive capability requirements RLG/NARA assessment criteria for a Trusted Digital Repository Barkstrom GGF paper - based on NASA Langley preservation model Analyze capabilities that can be based on grid technology iRODS rule-oriented data system Participants: 19 contributors to data grid federation for GIN MIT - PLEDGE project on preservation policies SDSC - NARA research prototype persistent archive U Md - Producer Archive Workflow Network EU CASPAR, PLANETS; UK Digital Curation Centre

4 © 2006 Open Grid Forum 4 Virtual Vellum Preserve shared collection of medieval manuscripts Preserve provenance of manuscripts (authenticity) Preserve display services (access) Preserve arrangement (respect des fonds) Preserve bits (integrity) http://www.dcs.shef.ac.uk/~mikem/virtualvellum Based on SRB data grid and Virtual Vellum services P F Ainsworth

5 © 2006 Open Grid Forum 5 Preservation Environment Requirements Bruce Barkstrom paper Described capabilities needed in a preservation environment ERA capabilities list http://www.crl.edu/content.asp?l1=13&l2=58&l3=160 RLG/NARA trusted digital repository assessment criteria http://www.dlib.org.ar/dlib/july06/ross/07ross.html Can we express these capabilities and assessment criteria as rules applied by the data grid?

6 © 2006 Open Grid Forum 6 ERA Capabilities List of 854 required capabilities: Management of disposition agreements describing how record retention and disposal actions Accession, the formal acceptance of records into the data management system Arrangement, the organization of the records to preserve a required structure (implemented as a collection/sub-collection hierarchy) Description, the management of descriptive metadata as well as text indexing Preservation, the generation of Archival Information Packages Access, the generation of Dissemination Information Packages Subscription, the specification of services that a user picks for execution Notification, the delivery of notices on service execution results Queuing of large scale tasks through interaction with workflow systems System performance and failure reports. Of particular interest is the identification of all failures within the data management system and the recovery procedures that were invoked. Transformative migration, the ability to convert specified data formats to new standards. In this case, each new encoding format is managed as a version of the original record. Display transformation, the ability to reformat a file for presentation. Automated client specification, the ability to pick the appropriate client for each user.

7 © 2006 Open Grid Forum 7 RLG/NARA TDR Assessment Criteria The assessment criteria can be mapped to management policies. The management policies can be mapped to a set of rules whose execution can be automated. The rules require definition of input parameters that define the assertion being implemented. The execution of the rules generates state information that can be evaluated to verify the assertion result The types of rules that are needed include: Specification of assertions (setting rule parameters - flags and descriptive metadata) Deferred consistency constraints that may be applied at any time Periodic rules that execute defined procedures Atomic rules applied on each operation (access controls, audit trails) The rules determine the metadata attributes that need to be managed Set of 174 rules

8 © 2006 Open Grid Forum 8 Digital Preservation Preservation is communication with the future How do we migrate records onto new technology (information syntax, encoding format, storage infrastructure, access protocols)? SRB - Storage Resource Broker data grid provides the interoperability mechanisms needed to manage multiple versions of technology Preservation manages communication from the past What information do we need from the past to make assertions about preservation assessment criteria (authenticity, integrity, chain of custody)? iRODS - integrated Rule-Oriented Data System

9 © 2006 Open Grid Forum 9 Socialization of Data Management iRODS - integrated Rule-Oriented Data System

10 © 2006 Open Grid Forum 10 Preservation Management Policies Authenticity Validate assertions made at time of data ingestion Validate existence of the descriptive (provenance) metadata Validate retention policy is consistent with submission agreement Integrity Maintain information about the management of the data Assertions made by the archivist Access controls, audit trails, checksums, replication, synchronization, federation Infrastructure independence Manage properties of records independently of choice of storage system Scalability Manage large collections (billions of records, petabytes of data, thousands of attributes) Aggregations across name spaces

11 © 2006 Open Grid Forum 11 iRODS Separate definition of management policies (rules) from definition of remote operations (micro- services) Control execution of all micro-services through application of rules Manage persistent state information for the results Query the persistent state information to validate assertions on preservation properties

12 © 2006 Open Grid Forum 12 iRODS - integrated Rule-Oriented Data System Resources Client InterfaceAdmin Interface Metadata Modifier Module Config Modifier Module Rule Modifier Module Consistency Check Module Confs Rule Base Metadata Persistent Repository Engine Rule Curren t State Rule Invoker Micro Service Modules Resource-based Services Micro Service Modules Metadata-based Services Service Manager Consistency Check Module Consistency Check Module

13 © 2006 Open Grid Forum 13 Managing Preservation Policies Require at least six name spaces for managing identity Logical storage resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name space Logical persistent state name space Require ability to federate name spaces Cross-register identity of object from each of the name spaces Require multiple levels of aggregation for each name space Typically three levels of aggregation Trust virtualization Ownership of the collection entities by the data grid

14 © 2006 Open Grid Forum 14 Metadata Attributes Associate state information with each name space User name Address, institution Group membership Type - (administrator, curator, owner, public) Logical file name System attributes Location, size, owner, checksum, container, … User-defined attributes Descriptive information Logical resource name Type of system Quotas

15 © 2006 Open Grid Forum 15 Federation Between Data Grids Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection B Data Access Methods (Web Browser, DSpace, OAI-PMH) Data Grid Logical resource name space Logical user name space Logical file name space Logical rule name space Logical micro-service name Logical persistent state Data Collection A

16 © 2006 Open Grid Forum 16 Aggregation of Identifiers Users {Single user, group, federation} Resources {Single storage system, cached system, cluster} Files {Single file, container, directory} Metadata {Single attribute, hierarchical table, collection} Management policies {Single capability, set of capabilities, nested rules}

17 © 2006 Open Grid Forum 17 Demonstration of Rules Rule specified in four parts Single line, parts separated by the symbol | Name | condition | function-calls | recovery-calls Name Conditions Functions calls Recovery calls Support multiple functions, separated by symbol ## acDeleteUser | | acDeleteDefaultCollections## msiDeleteUser## msiCommit | msiRollback## nop

18 © 2006 Open Grid Forum 18 Three Classes of Rules Internal rules Used within iRODS for standard data manipulation services Administrator rules Set by data grid administrator to enforce policies on shared collection User-defined rules Support server-driven workflows

19 © 2006 Open Grid Forum 19 Rule-based Data Management Associate rules with combinations of name spaces Rule set for a particular collection Rule set for a particular user group Rule set for a particular user group when accessing a particular collection Rule set for a particular storage system Rule set for a particular micro-service Generic rules based on SRB operations

20 © 2006 Open Grid Forum 20 Administrative Rules Currently 15 administrative rules Administrative Storage selection Data pre-processing Data post-processing Data deletion Parallel I/O

21 © 2006 Open Grid Forum 21 Administration Creation Rules acCreateUser | | msiCreateUser## acCreateDefaultCollections## msiCommit | msiRollback## msiRollback##nop acVacuum(*arg1) | | delayExec(msiVacuum,*arg1) | nop acCreateDefaultCollections | | acCreateUserZoneCollections | nop acCreateUserZoneCollections | | acCreateCollByAdmin(/$rodsZoneProxy/home,$otherUserName)## acCreateCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName) | nop##nop acCreateCollByAdmin(*parColl,*childColl) | | msiCreateCollByAdmin(*parColl,*childColl) | nop

22 © 2006 Open Grid Forum 22 Administration Deletion Rules acDeleteUser | | acDeleteDefaultCollections## msiDeleteUser## msiCommit | msiRollback##msiRollback##nop acDeleteDefaultCollections | | acDeleteUserZoneCollections | nop acDeleteUserZoneCollections | | acDeleteCollByAdmin(/$rodsZoneProxy/home,$otherUserName)## acDeleteCollByAdmin(/$rodsZoneProxy/trash/home,$otherUserName) | nop##nop acDeleteCollByAdmin(*parColl,*childColl) | | msiDeleteCollByAdmin(*parColl,*childColl) | nop

23 © 2006 Open Grid Forum 23 Data Manipulation Rules Rule for pre-processing on storage use acSetRescSchemeForCreate | | msiSetDefaultResc(demoResc,noForce)## msiSetRescSortScheme(random)## msiSetRescSortScheme(byRescType) | nop##nop##nop Rule for pre-processing on data reads acPreprocForDataObjOpen | | msiSortDataObj(random) | nop Rule for post processing data writes acPostProcForPut | | nop | nop acPostProcForCopy | | nop | nop Rule for setting number of threads for parallel I/O acSetNumThreads | | msiSetNumThreads(default,default,default) | nop Rule for data deletion policy setting acDataDeletePolicy | | nop | nop

24 © 2006 Open Grid Forum 24 iRODS Demonstration Demonstrate generic put command ilsresc ils -l nvo iput -R demoResc../src/icd.c nvo ils -l nvo Revise put command to automatically create a replica cp core.irb.1../../../server/config/reConfigs/core.irb ils -l nvo iput -R demoResc../src/ipwd.c nvo ils -l nvo Illustrate execution of a user-defined rule icd iput carl.ged foo1 irule -vF ruleInp3

25 © 2006 Open Grid Forum 25 iRODS Demonstration # iRODS Rule Base - core.irb # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recover call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce)## msiSetRescSortScheme(random)## msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | | nop | nop

26 © 2006 Open Grid Forum 26 iRODS Demonstration # iRODS Rule Base - core.irb # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recover call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce)## msiSetRescSortScheme(random)## msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | | nop | nop

27 © 2006 Open Grid Forum 27 iRODS Demonstration # iRODS Rule Base # Each rule consists of four parts separated by | # The four parts are: name, conditions, function calls, and recovery. # The calls and recoveries can be multiple ones, separated by ##. # For each rule, the number of recovery calls should match the calls; # for example, if the 2nd call fails, the 2nd recovery call is made. # acPreprocForDataObjOpen | | msiSortDataObj(random) | nop acSetRescSchemeForCreate | | msiSetDefaultResc(demo2Resc,noForce)## msiSetRescSortScheme(random)## msiSetRescSortScheme(byRescType) | nop##nop##nop acDataDeletePolicy | | nop | nop acPostProcForPut | $objPath like /tempZone/home/rods/nvo/* | msiSysReplDataObj(nvoReplResc) | nop acPostProcForPut | | nop | nop

28 © 2006 Open Grid Forum 28 iRODS Demonstration # This is an example of an input for the irule command. # This first input line is the rule body # The second input line is the input parameter in the format of label=value. # Multiple inputs can be specified using the '%' character as the separator. # The third input line is the output description. For multiple outputs use '% myTestRule | | msiDataObjOpen(*A,*S_FD)## msiDataObjCreate(*B,null,*D1_FD)## msiDataObjRead(*S_FD,100,*R1_BUF)## msiDataObjWrite(*D1_FD,*R1_BUF,*W1_LEN)## msiDataObjClose(*D1_FD,*junk2)## msiDataObjCreate(*C,null,*D2_FD)## msiDataObjRead(*S_FD,50000,*R2_BUF)## msiDataObjWrite(*D2_FD,*R2_BUF,*W2_LEN)## msiDataObjClose(*D2_FD,*junk3)## msiDataObjClose(*S_FD,*junk4) *A=/tempZone/home/rods/foo1%*B=/tempZone/home/rods/foo2%*C=/tempZone/h ome/rods/foo3 *R1_BUF%*W2_LEN%*A

29 © 2006 Open Grid Forum 29 iRODS Demonstration Add and query metadata imeta add -d foo1 speed 100 "mph" imeta add -d foo1 length 200 "ft" imeta add -d foo2 speed 300 "mph" imeta add -d foo3 length 400 "ft" imeta ls -d foo1 imeta qu -d speed = 100 imeta qu -d speed ">=" 100 imeta qu -d length ">=" 100 Copy Metadata imeta ls -d foo1 imeta ls -d foo3 imeta cp -d -d foo1 foo3 imeta ls -d foo3

30 © 2006 Open Grid Forum 30 iRODS Demonstration Copy metadata attributes on a file to a collection imeta ls -C /tempZone/home/rods imeta cp -d -C foo1 /tempZone/home/rods imeta ls -C /tempZone/home/rods

31 © 2006 Open Grid Forum 31 Preservation Environments Working group task Define the sets of Assertions--> set of persistent state Management policies--> set of rules Capabilities--> set of micro-services Solicit groups willing to contribute to development of rule- based technology CASPAR PLANETS NARA UK e-Science data grid IN2P3 ARROW Fedora preservation working group DSpace

32 © 2006 Open Grid Forum 32 Preservation Interoperability Preserve rules as property of each record Register versions of micro-services used to manipulate each record Register versions of persistent state information associated with each record When migrate record to a new preservation environment, migrate the rules, micro- services, and persistent state information

33 © 2006 Open Grid Forum 33 Preservation Evolution Can define new Rules Micro-services Persistent state information Can apply new rules in parallel with old rules, and take the most restrictive rule. Means preservation management policies, capabilities, and assertions can evolve over time.

34 © 2006 Open Grid Forum 34 Theory of Digital Preservation Definition of the persistent name spaces Definition of the operations that are performed upon the persistent name spaces Characterization of the changes to the persistent state information associated with each persistent name space that occur for each operation Characterization of the transformations that are made to the records for each operation Demonstration that the set of operations is complete, enabling the decomposition of every preservation process onto the operation set. Demonstration that the preservation management policies are complete, enabling the validation of all preservation assessment criteria. Demonstration that the persistent state information is complete, enabling the validation of assessment criteria. The assertion is then: if the operations are reversible, then a future preservation environment can recreate a record in its original form, maintain authenticity and integrity, support access, and display the record. A corollary is that such a system would allow records to be migrated between independent implementations of preservation environments, while maintaining authenticity and integrity.

35 © 2006 Open Grid Forum 35 More Information moore@sdsc.edu SRB: http://www.sdsc.edu/srb iRODS: http://irods.sdsc.edu/

36 © 2006 Open Grid Forum 36 Full Copyright Notice Copyright (C) Open Grid Forum (applicable years). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implementation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. The limited permissions granted above are perpetual and will not be revoked by the OGF or its successors or assignees.


Download ppt "© 2006 Open Grid Forum Preservation Enviroment Research Group Rule-based preservation."

Similar presentations


Ads by Google