Presentation is loading. Please wait.

Presentation is loading. Please wait.

Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.

Similar presentations


Presentation on theme: "Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team."— Presentation transcript:

1 Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team NDIIPP Partner Meeting, July 21, 2010

2 Library of Congress Web Archiving Program p. 1 10 years of archiving 5 full-time OSI staff on our team, plus 2 contractors, and other IT and Web Services support 80+ staff selecting content for our collections: Library Services, Law Library, and Congressional Research Services 30+ event and thematic collections 12,500+ URLs processed and permissions sent 181 TB of content collected

3 What We Do Pretty Well At This Point p. 2 Web Archiving workflows and processes had evolved, and had become more institutionalized Improved crawling strategies so we can react more quickly, manage our archive data better, and better serve our customers at LC Large-scale contract crawling by Internet Archive A move from collection-by-collection crawling to monthly and weekly “crawl buckets” Small-scale in-house crawling now available tests, emergency crawls

4 What We Do Pretty Well At This Point p. xx Better tools now to more easily manage our team’s work and all data about various activities: nomination, permissions, crawling, quality review, reporting, etc. Automation of manual activities to reduce time spent processing URLs for our nominators and our team

5 Ongoing Challenges p. 4 Selection What to select - so many URLs, so little time No full-time selection staff, everyone is busy Quality Review Training to involve Nominators more in the process – “Did we get what you wanted us to get?” Team Resources: 14 web archive projects actively crawling Testing our bandwidth

6 Ongoing Challenges p. 5 Legal Permissions: still only about 50% response rate Access for Researchers Harvesting: Collection of specific types of content: rapidly changing news content, YouTube Training Nominators re: frequency of collection Ramping up in-house crawling (Can we? Should we?) The Data: How do we transfer this content easily? From IA and within LC How do we manage it, store it, and preserve it?

7 More Information p. 6 Web Archiving Team Public Page (about the activity): http://www.loc.gov/webarchiving/ Library of Congress Web Archives (our collections): http://lcweb2.loc.gov/diglib/lcwa/html/lcwa-home.html Digital Preservation Video on Web Archiving: http://www.digitalpreservation.gov/videos/webarch09/index.html Contact: Abbie Grotke, abgr@loc.gov


Download ppt "Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team."

Similar presentations


Ads by Google