How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.

Slides:



Advertisements
Similar presentations
50 Years of Experience in Making Grey Literature Available Matching the Expectations of the Particle Physics Community Carmen ODell.
Advertisements

Developing and implementing an integrated system to support researchers: the St Andrews experience Managing Research: Smoothing the Way 27 Jan 2011 Janet.
The Future of Scholarship in the Digital Age: The Role of Institutional Repositories Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
Access and Operations Transforming the University of St Andrews Photographic Collection KE EMu European User Group Meeting April, 2012.
Internet Contracting Estimating and Accounting System ICEAS Multi-Client Software Presented By: I C E A S.
Providing collections, tools and services for digital humanities A national library perspective Clément Oury Head of Digital Legal Deposit Bibliothèque.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
PoliWeb project (PEPS'14) Geraldine Castel CEMRA, Université Stendhal, France Genoveva Vargas-Solar CNRS, LIG-LAFMIA, France Towards a cloud infrastructure.
From Web Archiving services to Web scale data processing platform Internet Memory Research GA IIPC, Paris, May 19th 2014.
About «Cross Border E-archive» Conference «Digital archives and historical cross border heritage» 19 June 2014, Riga, Latvia.
8 August 2001ALIA Untangling the Web 8/8/ Chris Taylor The University of Queensland Library Gateways: A cottage industry going places?
A partnership of Truman Presidential Museum & Library, Truman Institute, and the MU Design Team at CTIE Project Whistlestop.
Looking Ahead Archive-It Partner Meeting November 12, 2013.
Can public libraries & their users benefit & profit from Europeana ?
A centre of expertise in digital information managementwww.ukoln.ac.uk What Web 2.0 can do for you Ann Chapman UKOLN University of Bath Bath, UK UKOLN.
@MAKERERE DSpace Development At Makerere University An overview of the Uganda Science Digital Library (USDL) Pilot Project A paper presented at the DSpace.
1 Adaptive Management Portal April
Access to Digital Materials through the Library of Congress OPAC Presentation by Dr. Barbara B. Tillett Chief, Cataloging Policy and Support Office Library.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
Part of the Arts and Humanities Data Service and the UK Data Archive. Funded by the Joint Information Systems Committee and the Arts and Humanities Research.
The Role of the Public Library in the Digital Age Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by the Library and Information Commission,
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
Digitization at the National Archives and Records Administration Doris Hamburg Director, Preservation Programs James Hastings Director, Access Programs.
Recent approaches to capture web content, which Heritrix can’t harvest  Capturing Social Media  Screen filming of Rich Media  Project: Event crawl of.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
RECORDS MANAGEMENT AND THE WEB Presented by Jennifer Wright, Archives and Information Management Team and Lynda Schmitz Fuhrig, Electronic Records Division.
Blackboard Strategies: Using Blackboard Pedagogically.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
Ymchwil Research Ymchwil Research RESAW Ioan Isaac-Richards Ingest Processes Manager Head of Web Archiving
Building Scalable Web Archives Florent Carpentier, Leïla Medjkoune Internet Memory Foundation IIPC GA, Paris, May 2014.
Interoperable Digitised Content “Discover, search, extract, link, associate, and view digitised content” Les Carr.
Europeana - next steps Policy and practice Yvo Volman European Commission DG Information Society and Media Conference on the integration of Bulgarian cultural.
Cataloguing Electronic resources Prepared by the Cataloguing Team at Charles Sturt University.
Consolidating the European Library Space Luxembourg November 1999.
Presentation Path  Introduction to Ved Consultancy and OpenText  Current Challenges  The Valued Customers and Sectors  Our Solutions  Demo. Together,
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
EnrichUK.net: The nof-digitise Collection Portal Chris Anderson Head of Programmes New Opportunities Fund Pete Dowdell UKOLN.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
Digital Archiving in the Hungarian Széchényi Library The story and the plans of the Hungarian Electronic Library Rome, 21. Oct István Moldován OSZK,
1 The Technical Standards and Your Bid Sarah Ormes UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by Resource: The Council for Museums, Archives.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
WORKFLOW. What is workflow A system to manage and monitor working processes Defining and tracking the flow of work between individuals and/or departments.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Webarchivering in het Audiovisuele Domein Web archiving in the audiovisual Domain Julia Vytopil- Nederlands Instituut voor Beeld en Geluid Netherlands.
IT and IM: Promises and Pitfalls Greta Lowe August 15, 2011.
Development of Electronic Services in Public Libraries: Issues and Possibilities Sally Criddle UKOLN University of Bath Bath, BA2 7AY UKOLN is funded by.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
The University of Texas at Austin If We Build it, Will They Come? Providing Enhanced Access to an Archive-It Collection LAGDA - Latin American Government.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
Institutional Repositories: the DSpace Experience Ann J. Wolpert Director of Libraries Massachusetts Institute of Technology.
New Opportunities Fund Preservation Workshop March 15th 2002 Maggie Jones Cedars Project Manager.
Collection Description considerations in the nof-digitise programme Sarah Mitchell Programme Manager New Opportunities Fund.
Primo at the British Library Mandy Stewart. 2 About the British Library The British Library is the National Library of the UK It is a world-class.
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
Publishing from the Library: New Roles for Libraries in Scholarly Communications David Ruddy Cornell University Library September, 2004.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Community of Practice K Lead Project Team: الالتزامالتحفيز التفكير المؤسسي المرونةالتميزالشراكةالاستقامة.
Archiving & Preserving Digital Content
Born Digital 2016: generating public interest in digital preservation
Linked Open Data: Challenges and Opportunities for BAnQ
.Stat Suite built by the SIS-CC
LOSD Publication Deirdre Lee
Márton Németh – László Drótos How to catalogue a web archive?
Harvard Web Publishing Web Publishing for the Harvard Community
Web archives as a research subject
Metadata supported full-text search in a web archive
Presentation transcript:

How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER

Context: National Library of Ireland Beginnings: Established by the Dublin Science and Museum Act, 1877 Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”. The Digital Record: Born Digital Programme established in 2010, covering web archiving. Web Archive Projects: 2 pilot projects in 2011 LIBER

Context: Internet Memory European Archive / Internet Memory Foundation Established in 2004 in Amsterdam (offices also in Paris) Mission: to preserve Web content as a new media for current and future generations Actions: Sensibilization, partnerships, R&D Open Access Collections: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland Internet Memory Research Spin-off of IM established in June 2011 in Paris Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER

Web Archiving Project: Project Origins National Library of Ireland Building a 21 st Century Library: –Born Digital –Digitisation –Single Integrated Catalogue –Digital Repository –OSCAIL, the Digital Library Programme LIBER

Web Archiving Project: Project Origins National Library of Ireland Born Digital Materials: Natural progression for NLI’s strong political, cultural and historical collections How best to approach this in time of unprecedented financial difficulty? Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER

Web Archiving Project: Project Origins National Library of Ireland The Hand of History: –Snap General Election –Five Weeks LIBER

Web Archiving Project: Project Origins National Library of Ireland Just do it LIBER

Web Archiving Project: Project Origins National Library of Ireland Just do it How? LIBER

Web Archiving Project: Project Origins National Library of Ireland Collaborative Partnership: Partner that suited our requirements and that had experience with others in the cultural sector Requirements: –Technical skills in the NLI but working on other projects – needed these skills –Leverage NLI’s on strong curatorial experience, esp. in politics –Fast! LIBER

Web Archiving Project: Project Origins National Library of Ireland Project phases: –Project scoping and contract –Site selection –Permissions gathering –QA (look and feel) –Publication and promotion LIBER

Site Selection and Permissions National Library of Ireland Selection Criteria: –Website presence –Technical reasons –Cut-off date –Women candidates Permissions: –All sites contacted and provided with a brief –Pressurised but necessary phase LIBER

Scope of projects National Library of Ireland General Election: –Crawl: 200 snapshots –Scope: 100 seeds –Frequency: 2 times –Date: Feb Presidential Election: –Crawl: 80 snapshots –Scope: 70 seeds –Frequency: 3 times –Date: Oct-Nov LIBER

Crawl Internet Memory Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness Specific incidents  technical changes on the fly Modification of scope ; Pending crawls ; Adaptation of the politeness Improvement of second crawl LIBER

Quality Assurance (QA) National Library of Ireland Manual QA Jira software IM – Technical QA NLI - ‘Look and Feel’ QA Multiple browsers Communication with site owners (building relationships and promotion) LIBER

Quality Assurance (QA) Internet Memory Why? How? Manual and visual method: homepage + 2 Resolution of issues Temporal Coherence LIBER

Access National Library of Ireland Available to the public Full text search IM website – search by keyword, URL NLI catalogue – keyword via widget developed by NLI IS team and IM Future – access through NLI’s own interfaces, issue of integrating results LIBER

Publication and Promotion National Library of Ireland NLI social media initiative (Twitter and blog) Project participants Print media (esp. in area of technology) And IM! Usage figures have increased but real value more apparent in 5-10 years LIBER

Usage Statistics of Web Archive National Library of Ireland 21/09/2011: Official launch of NLI Web archives (Tweets) 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie 25/11/2011: Paper on irishtimes.com 20/01/2012: Paper on irishtimes.com 17/03/2012: Post on soundofthearchives.wordpress.com 04/05/2012: Paper on irisheconomy.ie LIBER

Advantages of Web Archiving National Library of Ireland Web archiving: –New opportunities for delivery of materials to users –Work with existing users expectations that content be online –Reach new audiences LIBER

Advantages of Web Archiving National Library of Ireland Political web archives;Irish General Election: –Researchers can compare online content pre- and post-election –Facilitates research into how ‘online’ this election was –Assess impact of technological developments in campaign communications –Record of campaign information LIBER

Benefits of Working Together National Library of Ireland Pilot project for a long-term activity: –Allowed us to enter a new collecting area despite lack of tech expertise –Facilitated collection of important material that one else was collecting –Collect material quickly –Leverage curatorial skills –Gained new technical skills LIBER

Benefits of Working Together Internet Memory To supporte the development of Web archiving initiatives To operate rapid deployment of Web archives To address new challenges in this area: Social media content QA Automatization LIBER

Conclusion General Election: 18,495,771 URLs 1.14 TB 10,405 ARCs Presidential Election: 7,333,399 URLs GB 2,513 ARCs View the NLI collections at: collections.aspx View the Web archive blog entry at: /26/general-election-2011-web- archiving/ View Internet Memory Collections at: To be continued… LIBER

LIBER Questions? Thanks for your attention! Chloe Martin Internet Memoryhttp://internetmem ory.orghttp://internetmem Catherine Ryan National Library of Ireland