Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bryan Beecher University of Michigan Director, Computing & Network Services E: W:

Similar presentations


Presentation on theme: "Bryan Beecher University of Michigan Director, Computing & Network Services E: W:"— Presentation transcript:

1 Bryan Beecher University of Michigan Director, Computing & Network Services E: bryan@umich.edubryan@umich.edu W: http://www.icpsr.umich.edu/ICPSR/staff/beecher.html/http://www.icpsr.umich.edu/ICPSR/staff/beecher.html/ Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences E: micah_altman@harvard.edu W: http://maltman.hmdc.harvard.edu/micah_altman@harvard.eduhttp://maltman.hmdc.harvard.edu/

2  Roadmap  Why replicate for preservation?  What is institutional model for replication in Data- PASS use?  How do we build on LOCKSS to support these institutional needs?  Collaborators and Conspirators  Leonid Andreev, IQSS; Steve Burling, ICPSR; Jonathan Crabtree, Odum; Marc Maynard, Roper; Nancy McGovern, ICPSR;

3  Technical  Media failure: storage conditions, media characteristics  Format obsolescence  Preservation infrastructure software failure  Storage infrastructure software failure  Storage infrastructure hardware failure  External Threats to Institutions  Third party attacks  Institutional funding  Change in legal regimes  Quis custodiet ipsos custodes?  Unintentional curatorial modification  Loss of institutional knowledge & skills  Intentional curatorial deaccessioning  Change in institutional mission

4  There are potential single points of failure in both technology, organization and legal regimes:  Diversify your portfolio: multiple software systems, hardware, organization  Find diverse partners – diverse business models, legal regimes http://failblog.org/2008/02/08/floppy-fail/

5  Consider organizational credentials  No organization is absolutely certain to be reliable  Consider the trust relationships across institutions http://flickr.com/photos/phauly/35555985/

6  Policy Driven  Institutional policy creates formal replication commitments  Replication commitments are described in metadata, using schema  Metadata drives  Configuration of replication network  Auditing of replication network

7  Asymmetric Commitments Partners vary in …  … storage commitments to replication  … size of holdings being replicated  … what holdings of other partners they replicate

8  Completeness  Complete public holdings of each partner  Retain previous version of holdings  Include  metadata  data  documentation  legal agreements

9  Restoration guarantees  Restore groups of versioned content  to owning archive  to replication hosts  Institutional failure restoration: – transfer entire holdings of an archive to another

10  Trust & Verification  Each partner is trusted …  … to hold the public content of other (not to disseminate improperly)  … to add units to be harvested  No partner is trusted to be “super-user” No deletion (or directly manipulation of replication storage owned by another partner  Legal agreements reinforce trust model  Schema based auditing used to …  … verify replication guarantees are met  … record replication and storage commitments  … document related TRAC criteria

11  Network level:  Identification: name; description; contact; access point URI  Capabilities: protocol version; number of replicates maintained; replication frequency; versioning/deletion support  Human readable documentation: restrictions on content that may be placed in the network; services guaranteed by the network; Virtual Organization policies relating to network maintenance  Host level  Identification: name; description; contact; access point URI  Capabilities: protocol version; storage available  Human readable terms of use: Documentation of hardware, software and operating personnel in support of TRAC criteria  Archival unit level  Identification: name; description; contact; access point URI  Attributes: update frequency, plugin required for harvesting, storage required  Terms of use: Required statement of content compliance with network terms. ; Dissemination terms and conditions  TRAC Integration  A number of elements comprise documentation showing how the replication system itself supports relevant TRAC criteria  Other elements that may be use to include text, or reference external text that documents evidence of compliance with TRAC criteria.  Specific TRAC criteria are identified implicitly, can be explicitly identified with attributes  Schema documentation describes each elements relevance to TRAC, and mapping to particular TRAC criteria

12  Initialization: Given schema instance distribute AU harvesting responsibility to hosts  Auditing: Does current host harvesting allocation & history match replication commitment in schema?  Recovery of hosts  Deliver AU content to source archive  Addition of AU’s, hosts  Growth of AU’s over initial commitment Assumptions  Nothing is deleted  Resources in network grow monotonically  Off-the-path behavior is detected automatically, resolved manually

13

14

15 LOCKSSCLOCKSS  Very easy to build and deploy  5 minutes  Very easy to plug into public LOCKSS network  5 minutes  Very easy to manage thereafter – it is basically an appliance  Also easy to set-up  Grouping your CLOCKSS devices into a private network and paring the 20k- line configuration file into the right 200-line configuration file is not  Managing a network now, not a device

16  Standard LOCKSS used for  Harvesting  Recovery  New LOCKSS bulk update mechanism used for  Initial configuration  Adding AU’s  CLOCKSS mechanisms (certificates, cache monitor)  Content delivery  Optimize recovery  Auditing  Data-PASS customizations for schema processing  Translating schema instance into bulk update requests  Reporting on compliance based on cache monitor database

17  Summer 2007: Attended the MetaArchive LOCKSS tutorial  Very good overview of LOCKSS  Summer 2007: SSP System Requirements Developed & Approved  Winter 2007: First public LOCKSS network nodes built at two Data-PASS sites  Winter 2007: SSP Replication Commitment Schema Developed  Spring 2008: Completed Test harvest of MRA collection into LOCKSS  Sprint 2008: SSP System Use Cases Developed  Spring 2008: Prototype plugin developed to harvest Dataverse Networks  Spring 2008: Data-PASS sites are joined into single Private LOCKSS Network (PLN)  Spring 2008: Met with LOCKSS developers to review use cases  SSP will leverage functionality “in the works” by LOCKSS team

18

19  Replication ameliorates institutional risks to preservation  Data PASS requires policy based, auditable, asymmetric replication commitments  Formalize policy in schema  (Re)Configure & audit LOCKSS using schema  Replication uses standard LOCKSS mechanisms


Download ppt "Bryan Beecher University of Michigan Director, Computing & Network Services E: W:"

Similar presentations


Ads by Google