Repository Development Center Office of Strategic Initiatives Releasing Open Source at the Library of Congress Leslie Johnston 2009 LITA Forum.

Slides:



Advertisements
Similar presentations
© 2008 EBSCO Information Services SUSHI, COUNTER and ERM Systems An Update on Usage Standards Ressources électroniques dans les bibliothèques électroniques.
Advertisements

Software change management
Configuration management
ICIS-NPDES Plugin Design Preview Webinar ICIS-NPDES Full Batch OpenNode2 Plugin Project Presented by Bill Rensmith Windsor Solutions, Inc. 3/15/2012.
Business Development Suit Presented by Thomas Mathews.
Copyright © 2008 SAS Institute Inc. All rights reserved. SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks.
Test Automation Success: Choosing the Right People & Process
Transformations at GPO: An Update on the Government Printing Office's Future Digital System George Barnum Coalition for Networked Information December.
PREMIS in Thought: Data Center for LC Digital Holdings Ardys Kozbial, Arwen Hutt, David Minor February 11, 2008.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Hydra Partners Meeting March 2012 Bill Branan DuraCloud Technical Lead.
Moving libraries to Web scale Matt Goldner Product & Technology Advocate 14 June 2011.
Systematic Review Data Repository (SRDR™) The Systematic Review Data Repository (SRDR™) was developed by the Tufts Evidence-based Practice Center (EPC),
Administration & Workflow
1 The IIPC Web Curator Tool: Steve Knight The National Library of New Zealand Philip Beresford and Arun Persad The British Library An Open Source Solution.
Chronopolis: Preserving Our Digital Heritage David Minor UC San Diego San Diego Supercomputer Center.
Presented by IBM developer Works ibm.com/developerworks/ 2006 January – April © 2006 IBM Corporation. Making the most of Creating Eclipse plug-ins.
Maintaining Windows Server 2008 File Services
A Digital Preservation Repository for Duke University Libraries Jim Coble Digital Repository Developer Open Repositories 2013.
Magdi Latif Regional Knowledge and Information Management Officer FAO Partnership, Advocacy and Capacity Development Division FAORNE Jordan Plant Genetic.
Project Proposal: Academic Job Market and Application Tracker Website Project designed by: Cengiz Gunay Client: Cengiz Gunay Audience: PhD candidates and.
Lesson 4 Computer Software
INTRODUCTION TO WEB DATABASE PROGRAMMING
Repository Development Group Office of Strategic Initiatives Transfer and Inventory Components of Developing Repository Services Leslie Johnston Open Repositories.
Linux Operations and Administration
Good practice in Research Data Management Module 6: Tools, training and support.
©Kwan Sai Kit, All Rights Reserved Windows Small Business Server 2003 Features.
1 Chapter 11 Implementation. 2 System implementation issues Acquisition techniques Site implementation tools Content management and updating System changeover.
Automated Archiving of DVD Content Esteva, Vega, Nieto, Scott, Gunnels, Kumar, Lamphear, Henriksen, Lee, Martin TCDL 2013.
Home Media Network Hard Drive Training for Update to 2.0 By Erik Collett Revised for Firmware Update.
Content Strategy.
5-7 November 2014 DR Workflow Practical Digital Content Management from Digital Libraries & Archives Perspective.
Max Planck Institute for Psycholinguistics Tool development report H. Brugman MPI Nijmegen.
 To explain the importance of software configuration management (CM)  To describe key CM activities namely CM planning, change management, version management.
CHAPTER FOUR COMPUTER SOFTWARE.
APPLICATION Provisioning & Management made EASY EASY to ManageEASY to Manage EASY to MarketEASY to Market.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
Configuration Management (CM)
InstantGMP: Electronic Batch Records System for GMP Manufacturing InstantGMP™ Inventory Control Module for GMP Manufacturing.
Marisa Hudspeth Lead Archivist, Digital Program Rockefeller Archive Center Sibyl Schaefer Digital Initiatives Metadata Librarian University of Vermont.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Training by the Office of Library and Information Services Contact for more information: karen.gardner- or
The Legal Agreements of the National Geospatial Digital Archive Julie Sweetkind-Singer Stanford University NDIIPP National Conference, Washington, DC June.
Sample School Website. What is wrong with the existing School Webspace Site? Can only host static pages – no dynamic content possible. Can not be edited.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Office Server Specific Web content management –Page structure, layouts, and controls –Publishing.
Selene Dalecky March 20, 2007 FDsys: GPO’s Digital Content System.
Carcanet Case Study Fran Baker, John Rylands University Library University of Manchester SPRUCE event 19 January 2012.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
National Archives and Records Administration Status of the ERA Project RACO Chicago Meg Phillips August 24, 2010.
1 Not So Strange Bedfellows: Information Standards For Librarians AND Publishers November 6, 2015.
Research Data Management At the Smithsonian Using Sidora CNI December 10, 2013.
C2 Applications Software Getting the Work Done Solve a particular problem or perform a particular task.
The DuraCloud Workshop Your hosts: Bill Branan & Carissa Smith.
DAITSS and the Florida Digital Archive Priscilla Caplan Florida Center for Library Automation iPRES 2006.
The Modeling Circle Courtesy M. Lautenschlager, DKRZ.
Managing live digital content with DuraSpace services Bill Branan PASIG Spring 2015.
Grant Writing for Digital Projects September 2012 IODE Project Office IODE Project Office Oostende, Belgium Oostende, Belgium Sustainability and.
Developing a Dark Archive for OJS Journals Yu-Hung Lin, Metadata Librarian for Continuing Resources, Scholarship and Data Rutgers University 1 10/7/2015.
GroupRocket.net. Years back checking s in the morning was the first ever thing most of the professionals would start their day with. And with the.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
The world’s libraries. Connected. The Benefits of CONTENTdm Hosting Services OCLC’s Digital Lifecycle Webinar Series April 9, 2013.
Archiving & Preserving Digital Content
Maintaining Windows Server 2008 File Services
DAITSS and the Florida Digital Archive
Joseph JaJa, Mike Smorul, and Sangchul Song
Bentley Project Reel Digitization Bentley Historical Library t
Simplified Development Toolkit
Overview of Workflows: Why Use Them?
Presented by : Karanvir Kaur Web Team
Leveraging Best Practices for Digital Asset & Marketing Collateral Management HITMC 2018 April 6, 2018.
Presentation transcript:

Repository Development Center Office of Strategic Initiatives Releasing Open Source at the Library of Congress Leslie Johnston 2009 LITA Forum

Repository Development Center / Office of Strategic Initiativesp.2 S TARTING D OWN A P ATH T OWARDS B ETTER C ONTROL  What are our most basic needs? What is the first step?  How do we know what we have, where it is, and who it belongs to?  How do we get files – new and legacy – from where they are to where they need to be?

Repository Development Center / Office of Strategic Initiativesp.3 I DENTIFYING THE T RANSFER P ROBLEM S PACE  As part of its first phase repository development, the Library of Congress is working on solutions for a category of activities that we refer to as “Transfer.” At a high level, we define transfer as including the following human- and machine-performed tasks:  Adding digital content to the collections, whether from an external partner or created at LC;  Moving digital content between storage systems (external and internal);  Review of digital files for fixity, quality and/or authoritativeness; and  Inventorying and recording transfer life cycle events for digital files.

Repository Development Center / Office of Strategic Initiativesp.4 R ECENT T RANSFER E XPERIENCE During 2008 the Library of Congress received:  30 Tb from NDIIPP preservation partners, 20 Tb in Web Capture crawls to preserve identified web sites, 30 Tb from National Digital Newspaper Project (NDNP) partners, and 1 Tb from World Digital Library partners. From 20 MB to over 2 Tb in a single transfer retrieved over the network.  Dozens of hard drives with licensed, partner and vendor supplied content.  All forms of content, some to be dark archived for preservation, some limited to Library use, and some to be made publicly available.  There is also newly internally digitized content that has to be managed.

Repository Development Center / Office of Strategic Initiativesp.5 D EVELOP A S TANDARD AND T OOLS TO O PTIMIZE T RANSFERS  Motivating use cases: Transfer of content internally and between preservation partners. Long-term storage of content.  Needs: Minimally self-identifying and self-describing packages. Support for error detection and transfer optimization.  Characteristics: Low overhead Content-type agnostic Supported by off-the-shelf, easily supported tools. BagIt: A Packaging Specification for File Transfers A packaging specification for file transfers. Supports minimally self-identifying and self-describing packages with support for error detection and transfer optimization.

Repository Development Center / Office of Strategic Initiativesp.6 W HAT’S IN A B AG? Package description: bag-info.txt /data directory with contents Manifest of contents with checksums

Repository Development Center / Office of Strategic Initiativesp.7 T RANSFER T OOL D EVELOPMENT  Parallel Retriever script  Efficient package transfer  Validation script  Validates Bags against the BagIt specification  VerifyIt script  Verifies that files are uncorrupted  BagIt Java Library (BIL)  Used for application and command line tool development  Bagger Desktop application  Graphical desktop tool to create/update/validate Bags  LocDrop Web application  Supports partner registration of transfers, whether shipping a hard drive or sending over the network.  Inventory System  Record lifecycle events for packages of Bags and files.  Workflow Tools To promote the use of BagIt in the Library and outside, tools were required to make the specification easy to use.

Repository Development Center / Office of Strategic Initiativesp.8 T RANSFER T OOL D EVELOPMENT: B AGGER  Bagger Graphical Bag Authoring Tool Allows users to create generic Bags or Bags that meet specified project profiles. Provides project-specific templates that enforce project Bag descriptive metadata requirements. Built on top of the BagIt Java Library. Presents a range of options for compressed serialization and complete versus “holey” bags. Java Webstart version automatically checks for the most recent version to keep the tool updated. Standalone version is bundled with all necessary software and runs without requiring installation privileges. Runs on a PC or Mac.

Repository Development Center / Office of Strategic Initiativesp.9 U SING B AGGER create and select a profileAdd files to the /data directory Entering bag-info metadata

Repository Development Center / Office of Strategic Initiativesp.10 U SING B AGGER Completed bag with generated manifest

Repository Development Center / Office of Strategic Initiativesp.11 L OC D ROP T OOL D EVELOPMENT  LocDrop is designed to support notification for transfers of content into the Library of Congress both from outside the Library and within the Library itself. The application currently lets you register network and physical media transfers (hard drives, CDs, DVDs, etc.) that the Library will retrieve. In later versions we expect to add the ability to launch network transfers directly.  LocDrop will simplify the processes to track content we expect to receive. Over time, we expect to connect this application to related services that will continually improve how we manage the transfer and receipt of materials from all sources.

Repository Development Center / Office of Strategic Initiativesp.12 U SING L OC D ROP Register the information needed to track data shipments to and from the Library

Repository Development Center / Office of Strategic Initiativesp.13 U SING L OC D ROP Register the information needed for the Library to retrieve network transfers

Repository Development Center / Office of Strategic Initiativesp.14 I NVENTORY T OOL D EVELOPMENT  Record Package Events  Examples of Package Events include “Package Received Events,” which are recorded when a project receives a package; and “Package Accepted Events,” which are recorded when a project accepts curatorial responsibility for a package.  Record File Events  Examples of File Events include “File Copy Events,” which are recorded when a package is copied from one File Location to another; and “Quality Review Events,” which are recorded when quality review is performed.  For legacy collections the Inventory Tool can be pointed at existing file systems and directories to package, checksum, and record life cycle events to bring the files under initial control.  The Inventory Tool is implemented on top of our BIL Java Library.

Repository Development Center / Office of Strategic Initiativesp.15 U SING THE I NVENTORY T OOL Running an Inventory operation

Repository Development Center / Office of Strategic Initiativesp.16 U SING THE I NVENTORY T OOL Searching the Inventory, plus auditing, file count, space usage, and project-specific Inventory reports

Repository Development Center / Office of Strategic Initiativesp.17 W ORKFLOW D EVELOPMENT  The Transfer components and Inventory Tool are tied together through multiple project-based Workflow systems.  Through case study development with stakeholders we identify the data flow and tasks to be performed.  Workflow tasks formalized through the system include transfer, validation by an format validation application, manual quality review inspection, and file copying to archival storage and production storage.  A workflow UI allows users to initiate, monitor and administer processes; and notify the workflow engine of the outcome of manual tasks, including task completion.

Repository Development Center / Office of Strategic Initiativesp.18 R UNNING A W ORKFLOW Starting, searching, and monitoring workflows

Repository Development Center / Office of Strategic Initiativesp.19 R UNNING A W ORKFLOW Updating an in-progress workflow

Repository Development Center / Office of Strategic Initiativesp.20 I NITIATING THE O PEN S OURCE R ELEASE  It was decided that the three utility scripts – the key tools needed for the movement and validation of Bagged content – should be the first candidates for open source release.  The scripts were submitted to the Office of General Counsel at the Library for review. This review included close scrutiny by the attorneys in the office for everything from purpose (automating a process) to originality (determining that no code came from any other licensed sources) to authorship (Library staff versus Library contractors).  Due to some contractual obligations with a contracting company which prohibited straightforward public domain release, the three scripts were released on SourceForge in December 2008 under a BSD license.

Repository Development Center / Office of Strategic Initiativesp.21 C ONTINUING THE O PEN S OURCE R ELEASE  The next vital release had to be BIL—the BagIt Library—a Java library developed to support Bag services.  A barrier to uptake of the BagIt specification was the ability to automate the Bagging process and to support the development of tools. BIL supports key functionality such as creating, manipulating validating, and verifying Bags, as well as the uploading of Bags using the SWORD deposit protocol.  The review of BIL for open source release by the Office of General Counsel was a more complex affair. There was a single author who was a Library staff member, but there were thirteen bundled dependencies each with their own licenses to be reviewed.  BIL was released into the public domain with the understanding that those licenses restricted any bundling of BIL and its dependencies into new tools by others, but in no way restricted the release.  BIL was released as both compiled and source code in June 2009.

Repository Development Center / Office of Strategic Initiativesp.22 M ANAGING THE R ELEASE  At the time of both releases the Library made a conscious decision to just release the code, and not take advantage of the SourceForge functionality that supports the committing of code back into the project.  These were three relatively simple scripts and it seemed to make the most sense to release them and let others work with them or use them to model their own development.  No one was available at the time who could devote the effort needed to manage a full-blown open source project.  The scripts can be updated by anyone in the community for their use. The Library has committed to releasing its updates to BIL. Updates to the source code are expected and welcome through the Digital Curation group.

Repository Development Center / Office of Strategic Initiativesp.23 U PCOMING R ELEASES  The Bagger application is nearing the completion of its development and partner testing. Bagger is meant to provide a graphical desktop to for the Bagging of content, ideally requiring no client-side IT support or infrastructure.  It is implemented as a Java Web-Start application for use across platforms as well as a standalone version with its own bundled, stripped down Java JRE, and supports the aggregation of files into Bag packages, including the creation of checksum manifests and Bag information files. It is developed on top of BIL.  The Bagger review includes the proposed release of three variants – the Java Webstart version, and standalone versions for the PC and Mac – as well as the source code.  The review encompasses a number of bundled dependencies, including the redistribution license for Java.

Repository Development Center / Office of Strategic Initiativesp.24 B UILDING A C OMMUNITY  The BagIt specification was posted on the Library of Congress and California Digital Library sites and as an Internet “Request for Comment” (RFC).  The BagIt specification will also be released on SourceForge to promote wider dissemination, discussion, and community building.  BagIt and the tools have been promoted to partners from three different initiatives, blogged, tweeted, shared on Facebook, presented at conferences, described in the Library’s Digital Preservation Newsletter, described in sent to listservs, discussed in a Google group, and written up in journal articles.  The team launched a Digital Curation Google group in part to support the activities of this increasingly participatory community and encourage open, public discussion.  The best strategy for building a community was in its use by the NDIIPP partner institutions. NDIIPP strongly encouraged partners to “bag” their content for their preservation transfers to the Library.

Repository Development Center / Office of Strategic Initiativesp.25 B UILDING A C OMMUNITY  The Library moved into new modes or promotion and community building, including development of an introductory video featuring Brian Vargas, one of the authors of the specification

Repository Development Center / Office of Strategic Initiativesp.26 S UCCESSES FOR THE R ELEASE  How is the success of this initiative measured?  There have been close to 300 downloads from the SourceForge site.  The Google group has over 120 participants.  A significant percentage of the 130 NDIIPP partners have utilized the BagIt specification in their preservation transfers to the Library.  The Library recently become aware of the open source Ruby BagIt, a Ruby Gem released in early 2009 to support use of the specification. 

Repository Development Center / Office of Strategic Initiativesp.27 O UTCOMES F OR T HE L IBRARY  The Library's first Open Source software release.   BagIt is in use with multiple NDIIPP partners, in the eDeposit pilot project, and for the packaging and transport of file packages internally.  Gradual development of graphical workflow tools for all active projects  The transfer of partner content has informed the Library’s own preservation efforts, building our understanding about what we need to know about files and what events in their life cycle we need to record and track.  The Inventory Tool will support the Library's initial efforts in a file-level preservation audit.  Put all tools and services into full production during 2009

Repository Development Center / Office of Strategic Initiativesp.28 Questions? Leslie Johnston