Presentation on theme: "The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown."— Presentation transcript:
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown University November 4, 2009
The Library of Congress 2 Agenda LC’s Web archiving program Overview of the Cooperative Project Featured Partner: Georgetown University Lessons Learned
The Library of Congress 3 Library of Congress Web Archives: loc.gov/lcwa
The Library of Congress 4 LC Collections: over 130 TB –US National Elections—2000, 2002, 2004, 2006, 2008 –Iraq War 2003 –September 11 2001 & September 11 Remembrance 2002 –Olympics 2002 –Congress—106 th, 107 th, 108 th, 109 th, 110 th, –Supreme Court Nominations –Legal Blawgs –Papal Transition –Overseas Operations: Indian and Indonesian Elections –Case Studies: health care, terrorism, visual image content, organizational Web sites, Crisis in Darfur, “single site” http://www.loc.gov/webarchiving/projects.html
The Library of Congress 5 Organizational Structure INFORMATION TECHNOLOGY OFFICE and TECHNICAL ARCHITECTURE TEAM Also in OSI. Supports Wayback and Web Curator Tool development, Repository development and Data Transfers. Contractors are also used in this area. BIBLIOGRAPHIC ACCESS MODS records are created in Library Services: the Network Development & MARC Standards Office & Acquisitions & Bibliographic Access staff do the cataloging. WEB ARCHIVING TEAM In the Office of Strategic Initiatives (OSI). We are project managers and technical staff focused on capture, tools, and permissions. CURATORS/RECOMMENDING OFFICERS In Library Services, Congressional Research Service, and the Law Library pick the collections and what URLs to archive, and research who to contact for permission.
The Library of Congress 6 Collaborations and Partnerships Early collections: Election 00 and 02, September 11 End of Term Project Hurricane Katrina Archive IIPC – upcoming Olympics Collection NDIIPP Partners K-12 Web Archiving Cooperative Archive-IT projects
The Library of Congress 7 Problem Web content that will be important for future research is disappearing before it can be collected Identification of sites, and review of captured sites, is labor-intensive; LC staff are stretched thin Outside institutions may not have resources/budgets for collecting web sites
The Library of Congress 8 Cooperative Archive-IT Project Concept Enlist Library Services subject experts to identify international and national high-value collecting areas, with a focus on foreign countries experiencing volatile political situations Enlist Library Services subject experts to identify scholarly centers, or partner institutions, with recognized expertise in the collecting areas, to assist in the collection and preservation of important at-risk materials Prioritize collecting areas/centers of expertise (7 priority areas selected)
The Library of Congress 9 Goals To enable institutions outside the Library to gain experience creating Web site collections To extend the network of NDIIPP partners working to identify and collect high value, at-risk Web materials To develop subject areas collections that could become part of the Library’s collections in the future, and To broaden the understanding of issues related to the development of curated collections of Web content.
The Library of Congress 10 Library of Congress agreed to: Establish and fund an Archive-It account for the partner for up to one year (with possible extension); Provide support as needed; Provide subject matter expertise as requested by the partner; Invite partner institutions to at least one conference at the Library (if funding is available); Maintain a second copy of the harvested content.
The Library of Congress 11 Each Center Was Asked To: Identify high risk, high value web sites for their area, and use Archive-It to harvest the sites; Document their selection criteria and provide it to the Library; Document issues, lessons learned, etc. related to their web collecting; Participate in a conference with Library experts and other participants (if scheduled).
The Library of Congress 12 Electronic Literature Organization Literary SitesJuly 12, 2008 – (ongoing) 9,214,920 documents 401.29 GB George Washington University, Institute for European, Russian, and Eurasian Studies Russian Parliamentary Elections, Dec. 07, and the Russian Presidential Election 08 August 13, 2007 – August 12, 2008 18,175,664 documents 870.09 GB Georgetown UniversityBelarus, Moldova, Ukraine September 17, 2007 - (ongoing) 19,880,435 documents 580 GB University of North Carolina, Chapel Hill Islam in AsiaSeptember 27, 2007 – February 1, 2008 3,856,205 documents 105.35 GB Stanford University Libraries, Islamic Studies Iranian BlogsFebruary 29, 2008 - (ongoing) 27,997,040 documents 2,099.70 GB George Washington University, Center for Global Health Avian bird flu in Asian countries June 3, 2008 – January 6, 2009 18,699,986 documents 640.6 GB
The Library of Congress 13 Featured Partner: Georgetown University Belarus, Moldova, Ukraine Collection Proposed by LC Curator: Grant Harris Aim: the web capture of fragile websites from Belarus, Moldova, and Ukraine, to include selected government websites, opposition parties, ethnic and religious groups, elections, and security issues.
The Library of Congress 18 Lessons Learned Finding good partners was KEY - partners should be committed and really “get” the concept of web archiving and archiving primary source materials Crawling ALL of Twitter – not so good. Confusion over LC’s own web archiving program vs. this project
The Library of Congress 19 Lessons Learned Collaborative collection building is a good thing –New partnerships formed –New ways for our curators to get engaged with web archiving –LC might not have been able to archive some content collected on our own (permissions, staff time, etc.)
The Library of Congress 20 Next Steps Three partners collecting (at least) for another year: ELO, Georgetown, and Stanford Focus on description and access: George Washington University/Russian Elections Future: Data transfer to LC
The Library of Congress 21 For more information LC Web Archiving: http://www.loc.gov/webarchiving/ LCWA: http://loc.gov/lcwa/ National Digital Information and Infrastructure Preservation Program: http://www.digitalpreservation.gov/ http://www.digitalpreservation.gov/ Georgetown’s Archive-IT collections: http://archive-it.org/public/partner?id=168 http://archive-it.org/public/partner?id=168
The Library of Congress 22 Questions? Abbie Grotke firstname.lastname@example.org@loc.gov Grant Harris email@example.com@loc.gov Jennifer Long firstname.lastname@example.org@georgetown.edu