Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.

Slides:



Advertisements
Similar presentations
K-12 Web Archiving Project Archive-It Partner Meeting November 4, 2009.
Advertisements

JISC/BL Workshop Digital Libraries and their services March 6, 2006 Richard Boulderstone Director eStrategy, The British Library.
Moving Forward With Digital Preservation at the Library of Congress Laura Campbell Associate Librarian for Strategic Initiatives Library of Congress.
Libraries for Future Generations Martha Anderson Director National Digital Information Infrastructure and Preservation Program The Library of Congress.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
K12 Web Archiving Program Lori Donovan Coordinator, K12 Web Archiving Program Internet Archive.
BnF projects and priorities On the collection side – Perform broad and focused crawls with a maximum of 100TB – Set up the legal deposit of ebooks.
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Data-PASS/NDIIPP: A new effort to harvest our history A funder view May 25,
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Looking Ahead Archive-It Partner Meeting November 12, 2013.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive July 2008.
1 Strategies for Collecting and Preserving Open Access Materials on the Web William Y. Arms Cornell University Federal Library and Information Center Committee.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
The Role of Libraries in Digital Data Preservation and Access: The Library of Congress Experience.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
National Digital Information Infrastructure and Preservation Program (NDIIPP) Building a Network of Preservation Partners CNI Spring Task Force Meeting.
1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
Web and Twitter Archiving at the Library of Congress Nicholas Taylor Web Archiving Team Library of Congress Web Archive Globalization.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Ymchwil Research Ymchwil Research RESAW Ioan Isaac-Richards Ingest Processes Manager Head of Web Archiving
Building Scalable Web Archives Florent Carpentier, Leïla Medjkoune Internet Memory Foundation IIPC GA, Paris, May 2014.
Columbia Digital Preservation Planning & Implementation Status Report, August 2010.
Managing Serials in an Electronic World the Stirling Experience Sonia Wilson University of Stirling Library 19 October 2004.
Trends & Challenges in Digital Object Storage Infrastructure: Notes from the National Digital Stewardship Alliance (NDSA) Infrastructure Working Group.
Open Access Symposium 2015 Open Access, the Law, and Public Information Mary Alice Baish UNT Dallas College of Law May 19, 2015 National Plan for Access.
CNI Fall Task Force, December 2007 International Internet Preservation Consortium Abbie Grotke IIPC Communications Officer Library of Congress & George.
Do You Have a Web Site?. Everyone does, don’t they?
Welcome to the Katrina Pescador Director of Library & Archives.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
K-12 Web Archiving Project NDIIPP Partner Meeting July 10, 2008.
Can we be doing more? Beth Tillinghast University of Hawaii at Manoa October 19, 2011 Archive-It Partner Meeting ACCESS TO OUR ARCHIVED WEBSITE COLLECTIONS.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
The Implementation of an ILS at the Library of Congress Presentation by Erik Delfino Assistant Implementation Coordinator ILS Program Office Library of.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Setting up the New Stuff Planning & Implementing Library 2.0 David Lee King Topeka & Shawnee County Public Library Davidleeking.com.
Continuing the work of the Bill & Melinda Gates Foundation Presented by: Jeff Stauffer WebJunction Service Manager Date: 3 February 2005.
Web Archiving Service (WAS) Rosalie Lack Data Curation for Practitioners 2012 Workshop.
OAIS: From Requirements to Reality at OCLC FLICC / CENDI Symposium, Dec Pam Kircher Product Manager, Digital Archive OCLC Digital & Preservation.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
The 3 M’s: MINERVA, MODS, and METS Allene Hayes (LC) Rebecca Guenther (LC) Leslie Myrick (NYU) DLF -- New Orleans April 20, 2004.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
The Web Archiving Service Spring 2009 Update User’s Council Annual Meeting Tracy Seneca California Digital Library Capture Today’s Web;
Content Transfer NDIIPP Meeting July 9, 2008 Jane Mandelbaum, LC.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
Challenges in Web Archiving UNT Perspective NDIIPP – July 21, 2010.
DigiBoard Curator Tools Fair IIPC GA 2014 Abbie Grotke ~ Library of Congress
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
VSERVE SOLUTION Ecommerce Product Management Service.
Digitization Workflows From the Digital Projects Unit University of North Texas Libraries Mark E. Phillips Jeremy D. Moore February 12, 2009.
Archiving & Preserving Digital Content
Workshop on Web Archiving
National Digital Stewardship Alliance Web Archiving Survey Update
Creating Web Collections with Archive-It
Office of Strategic Initiatives (OSI)
Preserving Our Collective Digital History
PUBLIC SCHOOL LAW Part 9: Primary Legal Sources: The Constitution
Presentation transcript:

Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team NDIIPP Partner Meeting, July 21, 2010

Library of Congress Web Archiving Program p years of archiving 5 full-time OSI staff on our team, plus 2 contractors, and other IT and Web Services support 80+ staff selecting content for our collections: Library Services, Law Library, and Congressional Research Services 30+ event and thematic collections 12,500+ URLs processed and permissions sent 181 TB of content collected

What We Do Pretty Well At This Point p. 2 Web Archiving workflows and processes had evolved, and had become more institutionalized Improved crawling strategies so we can react more quickly, manage our archive data better, and better serve our customers at LC Large-scale contract crawling by Internet Archive A move from collection-by-collection crawling to monthly and weekly “crawl buckets” Small-scale in-house crawling now available tests, emergency crawls

What We Do Pretty Well At This Point p. xx Better tools now to more easily manage our team’s work and all data about various activities: nomination, permissions, crawling, quality review, reporting, etc. Automation of manual activities to reduce time spent processing URLs for our nominators and our team

Ongoing Challenges p. 4 Selection What to select - so many URLs, so little time No full-time selection staff, everyone is busy Quality Review Training to involve Nominators more in the process – “Did we get what you wanted us to get?” Team Resources: 14 web archive projects actively crawling Testing our bandwidth

Ongoing Challenges p. 5 Legal Permissions: still only about 50% response rate Access for Researchers Harvesting: Collection of specific types of content: rapidly changing news content, YouTube Training Nominators re: frequency of collection Ramping up in-house crawling (Can we? Should we?) The Data: How do we transfer this content easily? From IA and within LC How do we manage it, store it, and preserve it?

More Information p. 6 Web Archiving Team Public Page (about the activity): Library of Congress Web Archives (our collections): Digital Preservation Video on Web Archiving: Contact: Abbie Grotke,