Bryan Beecher University of Michigan Director, Computing & Network Services E: W:

Slides:



Advertisements
Similar presentations
A Community Approach to Preservation: Experiences with Social Science Data ASIST Summit 2010 Jonathan Crabtree April 9, 2010.
Advertisements

ETD Preservation Workshop Session Four: Collection Management for Preservation Gail McMillan, Virginia Tech.
Pulling it all together… with thanks to Sheila Anderson.
The Data-PASS Partnership: Collaboration, Agreements, and More Myron Gutmann ICPSR University of Michigan.
Software Quality Assurance Plan
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
ICPSR and the Data Seal of Approval: A Case Study Mary Vardigan Assistant Director, ICPSR October 8, 2013.
Trustworthy Repository Criteria, Virtual Organizations, and Infrastructure MacKenzie Smith, MIT Libraries NDIIPP Meeting, July 2010.
AN OPEN-SOURCE SYSTEM FOR AUTOMATIC POLICY-BASED COLLABORATIVE ARCHIVAL REPLICATION Using the SafeArchive System The SafeArchive System coordinates six.
Micah Altman MIT Libraries Jonathan Crabtree Odum Institute UNC Chapel Hill Prepared for Aligning Digital Preservation across Nations Amsterdam 2013 Website:
Title Subtitle. Building Relationships: “A Foundation for Digital Archives” Digital Object Repository Systems in Digital Libraries (DORSDL) September.
Replicated & Distributed Storage Technologies : “Impact on Social Science Data Archive Policies” IASSIST 2010 Ithaca, New York Jonathan Crabtree June.
A Community Approach to Preservation: “Experiences with Social Science Data” Community Approaches to Digital Preservation 2009 Jonathan Crabtree February.
Data - Information - Knowledge
Developing a Records & Information Retention & Disposition Program:
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Collaborative Digital Preservation with LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Archiving our Social Science Digital History ECURE 2005 March 1, 2005.
Data-PASS/NDIIPP: A new effort to harvest our history IASSIST/IFDO 2005 May, 25, 2005.
DITSCAP Phase 2 - Verification Pramod Jampala Christopher Swenson.
AIIM Presentation Selecting and Implementing A Records Management System June 5, 2008.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Data-PASS Shared Catalog Micah Altman & Jonathan Crabtree 1 Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate.
OCLC Online Computer Library Center A Global OpenURL Resolver Registry Phil Norman OCLC Dlsr4lib Workshop March 23 rd, 2006 Arlington VA.
Module 9 Configuring Server Security Compliance. Module Overview Securing a Windows Infrastructure Overview of EFS Configuring an Audit Policy Overview.
Keeping your Archive Safe (and on TRAC) with SafeArchive and LOCKSS Thu-Mai Christian [Slides] Micah Altman Jonathan Crabtree [Project Directors]
Concepts of Database Management Sixth Edition
Repository Requirements and Assessment August 1, 2013 Data Curation Course.
System Center 2012 Certification and Training May 2012.
IT Infrastructure Chap 1: Definition
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
Trustworthy Repositories, Organizations & Infrastructure Micah Altman, Institute for Quantitative Social Science, Harvard University Jonathan Crabtree,
DigCCurr Professional Institute: Curation Practices for the Digital Object Lifecycle Digital Curation Program Development Nancy Y McGovern Research Assistant.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Libraries, Archives, and Digital Preservation: The Reality of What We Must Do Leslie Johnston Acting Director, National Digital Information Infrastructure.
Micah Altman Associate Director, Harvard-MIT Data Center Institute for Quantitative Social Science, Harvard University Bryan Beecher Director of Computing.
The Canadian Information Network for Research in the Social Sciences and Humanities Tim Au Yeung and Mary Westell Libraries.
IS 325 Notes for Wednesday August 28, Data is the Core of the Enterprise.
E.Soundararajan R.Baskaran & M.Sai Baba Indira Gandhi Centre for Atomic Research, Kalpakkam.
Report on Preservation of ETDs: The LOCKSS Prototype The work of Kamini Santhanagopalan Virginia Tech Graduate Student in Computer Science Reported at.
Archival Workshop on Ingest, Identification, and Certification Standards Certification (Best Practices) Checklist Does the archive have a written plan.
Data-PASS Partners: Plans for Moving Forward Marc Maynard The Roper Center for Public Opinion Research University of Connecticut July 2010.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
NDSR Boston webinar: Digital Preservation Introduction Presenter: Nancy Y McGovern October 2015.
Introduction and Overview of Information Security and Policy By: Hashem Alaidaros 4/10/2015 Lecture 1 IS 332.
Data Citation in The Dataverse Network ® Micah Altman, Institute for Quantitative Social Science, Harvard University Prepared for the Board on Research.
Martin Halbert MetaArchive Cooperative Thursday, June 25, 2009 NDIIPP Annual Meeting Washington, D.C.
Archiving and Preservation Michele Kimpton CEO, DuraSpace Bryan Beecher Director, ICPSR DuraSpace Webinar November 2, 2011.
Aligning Digital Preservation Policies with Community Standards Nancy McGovern Digital Preservation Officer.
Distributed Digital Preservation Networks Across a Region, Across a State: Stretching LOCKSS Gail McMillan, Virginia Tech Martin Halbert, Emory Aaron Trehub,
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
SEDAC Long-Term Archive Development Robert R. Downs Socioeconomic Data and Applications Center Center for International Earth Science Information Network.
Microsoft Azure and ServiceNow: Extending IT Best Practices to the Microsoft Cloud to Give Enterprises Total Control of Their Infrastructure MICROSOFT.
Big Data Analytics Are we at risk? Dr. Csilla Farkas Director Center for Information Assurance Engineering (CIAE) Department of Computer Science and Engineering.
Active Directory Domain Services (AD DS). Identity and Access (IDA) – An IDA infrastructure should: Store information about users, groups, computers and.
Katherine Skinner, Martin Halbert & Matt Schultz Educopia Institute and MetaArchive Cooperative NDSA Infrastructure Committee
Digital Preservation MetaArchive Cooperative, Digital Preservation Policy Planning Workshop Boston College, Boston, MA October 26, 2010.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
A Solution for Maintaining File Integrity within an Online Data Archive Dan Scholes PDS Geosciences Node Washington University 1.
Trustworthiness of Preservation Systems
An Overview of Data-PASS Shared Catalog
11/16/2018 © 2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks.
Datacastle RED Delivers a Proven, Enterprise-Class Endpoint Data Protection Solution that Is Scalable to Millions of Devices on the Microsoft Azure Platform.
Dell Data Protection | Rapid Recovery: Simple, Quick, Configurable, and Affordable Cloud-Based Backup, Retention, and Archiving Powered by Microsoft Azure.
Metadata The metadata contains
Robin Dale RLG OAIS Functionality Robin Dale RLG
Nancy Y. McGovern Digital Preservation Officer, ICPSR IASSIST 2007
Microsoft Data Insights Summit
Presentation transcript:

Bryan Beecher University of Michigan Director, Computing & Network Services E: W: Micah Altman Harvard University Archival Director, Henry A. Murray Research Archive Associate Director, Harvard-MIT Data Center Senior Research Scientist, Institute for Quantitative Social Sciences E: W:

 Roadmap  Why replicate for preservation?  What is institutional model for replication in Data- PASS use?  How do we build on LOCKSS to support these institutional needs?  Collaborators and Conspirators  Leonid Andreev, IQSS; Steve Burling, ICPSR; Jonathan Crabtree, Odum; Marc Maynard, Roper; Nancy McGovern, ICPSR;

 Technical  Media failure: storage conditions, media characteristics  Format obsolescence  Preservation infrastructure software failure  Storage infrastructure software failure  Storage infrastructure hardware failure  External Threats to Institutions  Third party attacks  Institutional funding  Change in legal regimes  Quis custodiet ipsos custodes?  Unintentional curatorial modification  Loss of institutional knowledge & skills  Intentional curatorial deaccessioning  Change in institutional mission

 There are potential single points of failure in both technology, organization and legal regimes:  Diversify your portfolio: multiple software systems, hardware, organization  Find diverse partners – diverse business models, legal regimes

 Consider organizational credentials  No organization is absolutely certain to be reliable  Consider the trust relationships across institutions

 Policy Driven  Institutional policy creates formal replication commitments  Replication commitments are described in metadata, using schema  Metadata drives  Configuration of replication network  Auditing of replication network

 Asymmetric Commitments Partners vary in …  … storage commitments to replication  … size of holdings being replicated  … what holdings of other partners they replicate

 Completeness  Complete public holdings of each partner  Retain previous version of holdings  Include  metadata  data  documentation  legal agreements

 Restoration guarantees  Restore groups of versioned content  to owning archive  to replication hosts  Institutional failure restoration: – transfer entire holdings of an archive to another

 Trust & Verification  Each partner is trusted …  … to hold the public content of other (not to disseminate improperly)  … to add units to be harvested  No partner is trusted to be “super-user” No deletion (or directly manipulation of replication storage owned by another partner  Legal agreements reinforce trust model  Schema based auditing used to …  … verify replication guarantees are met  … record replication and storage commitments  … document related TRAC criteria

 Network level:  Identification: name; description; contact; access point URI  Capabilities: protocol version; number of replicates maintained; replication frequency; versioning/deletion support  Human readable documentation: restrictions on content that may be placed in the network; services guaranteed by the network; Virtual Organization policies relating to network maintenance  Host level  Identification: name; description; contact; access point URI  Capabilities: protocol version; storage available  Human readable terms of use: Documentation of hardware, software and operating personnel in support of TRAC criteria  Archival unit level  Identification: name; description; contact; access point URI  Attributes: update frequency, plugin required for harvesting, storage required  Terms of use: Required statement of content compliance with network terms. ; Dissemination terms and conditions  TRAC Integration  A number of elements comprise documentation showing how the replication system itself supports relevant TRAC criteria  Other elements that may be use to include text, or reference external text that documents evidence of compliance with TRAC criteria.  Specific TRAC criteria are identified implicitly, can be explicitly identified with attributes  Schema documentation describes each elements relevance to TRAC, and mapping to particular TRAC criteria

 Initialization: Given schema instance distribute AU harvesting responsibility to hosts  Auditing: Does current host harvesting allocation & history match replication commitment in schema?  Recovery of hosts  Deliver AU content to source archive  Addition of AU’s, hosts  Growth of AU’s over initial commitment Assumptions  Nothing is deleted  Resources in network grow monotonically  Off-the-path behavior is detected automatically, resolved manually

LOCKSSCLOCKSS  Very easy to build and deploy  5 minutes  Very easy to plug into public LOCKSS network  5 minutes  Very easy to manage thereafter – it is basically an appliance  Also easy to set-up  Grouping your CLOCKSS devices into a private network and paring the 20k- line configuration file into the right 200-line configuration file is not  Managing a network now, not a device

 Standard LOCKSS used for  Harvesting  Recovery  New LOCKSS bulk update mechanism used for  Initial configuration  Adding AU’s  CLOCKSS mechanisms (certificates, cache monitor)  Content delivery  Optimize recovery  Auditing  Data-PASS customizations for schema processing  Translating schema instance into bulk update requests  Reporting on compliance based on cache monitor database

 Summer 2007: Attended the MetaArchive LOCKSS tutorial  Very good overview of LOCKSS  Summer 2007: SSP System Requirements Developed & Approved  Winter 2007: First public LOCKSS network nodes built at two Data-PASS sites  Winter 2007: SSP Replication Commitment Schema Developed  Spring 2008: Completed Test harvest of MRA collection into LOCKSS  Sprint 2008: SSP System Use Cases Developed  Spring 2008: Prototype plugin developed to harvest Dataverse Networks  Spring 2008: Data-PASS sites are joined into single Private LOCKSS Network (PLN)  Spring 2008: Met with LOCKSS developers to review use cases  SSP will leverage functionality “in the works” by LOCKSS team

 Replication ameliorates institutional risks to preservation  Data PASS requires policy based, auditable, asymmetric replication commitments  Formalize policy in schema  (Re)Configure & audit LOCKSS using schema  Replication uses standard LOCKSS mechanisms