Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.

Similar presentations


Presentation on theme: "The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation."— Presentation transcript:

1 The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation Partnerships

2 The Library of Congress 2 Born Digital “At-Risk” Web Sites http://www.loc.gov/minerva/collect/elec2000 http://www.loc.gov/minerva/collect/sept11

3 The Library of Congress 3 Take Actions that are Catalytic –Invest in existing strengths Collaborative –Engage partners in areas of mutual interest and expertise Iterative –Learn by doing Strategic –Broad spectrum of balanced short-term & investments NDIIPP Strategic Direction

4 The Library of Congress 4 Web of projects UIUC NARA GPO LC Web Projects IIPC NDIIP CDL IA AIHT Preservation Partners States Initiative

5 The Library of Congress 5 Library of Congress Web Archiving Collaborate with partners working on the same preservation issues Develop collection strategies to leverage available resources Learn by doing Strategy

6 The Library of Congress 6 Collaborate with partners working on the same preservation issues Membership in the International Internet Preservation Consortium (IIPC) Cooperative projects with NDIIPP Preservation Partners –California Digital Library –University of Illinois at Champaign-Urbana Technical information sharing with other US government agencies –Government Printing Office –National Archives and Records Administration

7 The Library of Congress 7 Collect thematically both by crawling and by acquiring collections gathered by others Develop collection strategies to leverage available resources Learn by doing Case studies and regular collection of theme-based collections Participate in tools development with IIPC Archive Ingest & Handling Project

8 The Library of Congress 8 Challenges of collecting from the Web Characteristics of the resource--dynamic, deep, linked Intellectual property laws and regulations Tension of preservation vs access goals Degree of alignment with current collection policies for other media Curation strategy Tools for identification and selection Tools for collection, curation, and archiving of large web collections

9 The Library of Congress 9 Average Web Collection Begins with a theme or event Usually does not include commercial sites Starts with a list of about 200 urls Is crawled by vendor Yields about 1 TB of data per month Has a frequency of once a week

10 The Library of Congress 10 Web Collections to date at LC Event-based –US National Elections—2000, 2002, 2004 –War in Iraq –September 11 Public Policy Topics –Health Care –Legislative Branch –Terrorism 26 TB

11 The Library of Congress 11 Archive Ingest & Handling Test AIHT is a first test of proposed NDIIP preservation architecture. The test is conducted with a common data set. –George Mason University 9/11 Archive Phase I tests ingest and data handling in local systems. Phase II tests export and import between institutions. Phase III explores format migration.

12 The Library of Congress 12 GMU 9/11 Archive Participants demonstrate capabilities Participants exchange archive

13 The Library of Congress 13 Participants Old Dominion University, Department of Computer Science Stanford University Libraries & Academic Information Resources The Johns Hopkins University, Sheridan Libraries Harvard University Library

14 The Library of Congress 14 George Mason University 9/11 Archive: Breakdown by File Types 57,450+ files 12GB Originally stored in a Linux environment

15 The Library of Congress 15 Goals of AIHT Gain practical experience with multiple institutions Document transfer and ingest processes for multiple systems Determine next set of tasks for developing interfaces between layers and institutions

16 The Library of Congress 16 Status of AIHT All phases completed. –Imports focused on technical assessment of archive and developing tools to examine the archive –Exports included METS and MPG21 DID objects –Migrations included transforms to JPG2000, TIFF, and some exploration of html to xml and avi to mpg Full report expected by early summer.

17 The Library of Congress 17 For more information…. NDIIPP Technical Architecture version 0.2 http://www.digitalpreservation.gov International Internet Preservation Consortium http://netpreserve.org/about/index.php http://netpreserve.org/about/index.php MINERVA: Mapping the INternet Electronic Resources Virtual Archive http://www.loc.gov/minerva/ http://www.loc.gov/minerva/

18 The Library of Congress 18 Martha Anderson NDIIP Program Officer Office of Strategic Initiatives The Library of Congress Washington, DC mande@loc.gov


Download ppt "The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation."

Similar presentations


Ads by Google