Presentation is loading. Please wait.

Presentation is loading. Please wait.

How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.

Similar presentations


Presentation on theme: "How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library."— Presentation transcript:

1 How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library of Ireland LIBER 2012 - 1

2 Context: National Library of Ireland Beginnings: Established by the Dublin Science and Museum Act, 1877 Mission: “to collect, preserve, promote and make accessible the documentary and intellectual record of the life of Ireland”. The Digital Record: Born Digital Programme established in 2010, covering web archiving. Web Archive Projects: 2 pilot projects in 2011 LIBER 2012 - 2

3 Context: Internet Memory European Archive / Internet Memory Foundation Established in 2004 in Amsterdam (offices also in Paris) Mission: to preserve Web content as a new media for current and future generations Actions: Sensibilization, partnerships, R&D Open Access Collections: UK National Archives & Parliament, PRONI, CERN and The National Library of Ireland Internet Memory Research Spin-off of IM established in June 2011 in Paris Missions: to operate large scale or selective crawls & develop new technologies (crawl, access, processing and extraction) LIBER 2012 - 3

4 Web Archiving Project: Project Origins National Library of Ireland Building a 21 st Century Library: –Born Digital –Digitisation –Single Integrated Catalogue –Digital Repository –OSCAIL, the Digital Library Programme LIBER 2012 - 4

5 Web Archiving Project: Project Origins National Library of Ireland Born Digital Materials: Natural progression for NLI’s strong political, cultural and historical collections How best to approach this in time of unprecedented financial difficulty? Born Digital Programme established to examine requirements and produce a policy document for the next steps LIBER 2012 - 5

6 Web Archiving Project: Project Origins National Library of Ireland The Hand of History: –Snap General Election –Five Weeks LIBER 2012 - 6

7 Web Archiving Project: Project Origins National Library of Ireland Just do it LIBER 2012 - 7

8 Web Archiving Project: Project Origins National Library of Ireland Just do it How? LIBER 2012 - 8

9 Web Archiving Project: Project Origins National Library of Ireland Collaborative Partnership: Partner that suited our requirements and that had experience with others in the cultural sector Requirements: –Technical skills in the NLI but working on other projects – needed these skills –Leverage NLI’s on strong curatorial experience, esp. in politics –Fast! LIBER 2012 - 9

10 Web Archiving Project: Project Origins National Library of Ireland Project phases: –Project scoping and contract –Site selection –Permissions gathering –QA (look and feel) –Publication and promotion LIBER 2012 - 10

11 Site Selection and Permissions National Library of Ireland Selection Criteria: –Website presence –Technical reasons –Cut-off date –Women candidates Permissions: –All sites contacted and provided with a brief –Pressurised but necessary phase LIBER 2012 - 11

12 Scope of projects National Library of Ireland General Election: –Crawl: 200 snapshots –Scope: 100 seeds –Frequency: 2 times –Date: Feb. 2011 Presidential Election: –Crawl: 80 snapshots –Scope: 70 seeds –Frequency: 3 times –Date: Oct-Nov. 2011 LIBER 2012 - 12

13 Crawl Internet Memory Seeds Validation: URLs, Duplication, Redirection, External links, Dynamic websites Scope Parameters: Domain, host and path ; Social Web content ; Frequency ; Robots.txt files exclusion ; Politeness Specific incidents  technical changes on the fly Modification of scope ; Pending crawls ; Adaptation of the politeness Improvement of second crawl LIBER 2012 - 13

14 Quality Assurance (QA) National Library of Ireland Manual QA Jira software IM – Technical QA NLI - ‘Look and Feel’ QA Multiple browsers Communication with site owners (building relationships and promotion) LIBER 2012 - 14

15 Quality Assurance (QA) Internet Memory Why? How? Manual and visual method: homepage + 2 Resolution of issues Temporal Coherence LIBER 2012 - 15

16 Access National Library of Ireland Available to the public Full text search IM website – search by keyword, URL NLI catalogue – keyword via widget developed by NLI IS team and IM Future – access through NLI’s own interfaces, issue of integrating results LIBER 2012 - 16

17 Publication and Promotion National Library of Ireland NLI social media initiative (Twitter and blog) Project participants Print media (esp. in area of technology) And IM! Usage figures have increased but real value more apparent in 5-10 years LIBER 2012 - 17

18 Usage Statistics of Web Archive National Library of Ireland 21/09/2011: Official launch of NLI Web archives (Tweets) 26/10/2011: Blog post on nli.ie/blog and Paper in thejournal.ie 25/11/2011: Paper on irishtimes.com 20/01/2012: Paper on irishtimes.com 17/03/2012: Post on soundofthearchives.wordpress.com 04/05/2012: Paper on irisheconomy.ie LIBER 2012 - 18

19 Advantages of Web Archiving National Library of Ireland Web archiving: –New opportunities for delivery of materials to users –Work with existing users expectations that content be online –Reach new audiences LIBER 2012 - 19

20 Advantages of Web Archiving National Library of Ireland Political web archives;Irish General Election: –Researchers can compare online content pre- and post-election –Facilitates research into how ‘online’ this election was –Assess impact of technological developments in campaign communications –Record of campaign information LIBER 2012 - 20

21 Benefits of Working Together National Library of Ireland Pilot project for a long-term activity: –Allowed us to enter a new collecting area despite lack of tech expertise –Facilitated collection of important material that one else was collecting –Collect material quickly –Leverage curatorial skills –Gained new technical skills LIBER 2012 - 21

22 Benefits of Working Together Internet Memory To supporte the development of Web archiving initiatives To operate rapid deployment of Web archives To address new challenges in this area: Social media content QA Automatization LIBER 2012 - 22

23 Conclusion General Election: 18,495,771 URLs 1.14 TB 10,405 ARCs Presidential Election: 7,333,399 URLs 278.10 GB 2,513 ARCs View the NLI collections at: http://www.nli.ie/en/udlist/digital- collections.aspx View the Web archive blog entry at: http://www.nli.ie/blog/index.php/2011/10 /26/general-election-2011-web- archiving/ View Internet Memory Collections at: http://collections.europarchive.org/ To be continued… LIBER 2012 - 23

24 LIBER 2012 - 24 Questions? Thanks for your attention! Chloe Martin Internet Memoryhttp://internetmem ory.orghttp://internetmem ory.org chloe@internetmemory.net @InternetMemory Catherine Ryan National Library of Ireland http://www.nli.ie cryan@nli.ie @NLIreland


Download ppt "How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library."

Similar presentations


Ads by Google