Presentation is loading. Please wait.

Presentation is loading. Please wait.

The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.

Similar presentations


Presentation on theme: "The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library."— Presentation transcript:

1 The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library

2 Overview National Library Act 2003 - Legal deposit extension to include electronic publications Selective web harvesting (Alexander Turnbull Library) Whole of Domain harvest (National Library of New Zealand) How archived web content at NLNZ might assist the web record keeping process

3 National Library of New Zealand (Te Puna Mātauranga o Aotearoa) Act 2003: – requires the Library to collect, preserve, protect and provide access to New Zealand’s documentary heritage. – legal deposit extended to include internet documents from August 2006 onwards. This includes websites. – Publishers’ responsibility: legally required to assist the National Library to make a copy of their internet document upon request. – Publishers are encouraged to deposit internet documents such as online books and serials with NLNZ, but not websites. NLNZ uses a web harvester instead. Legal Deposit

4 Context 377,341. NZ domain names currently registered. All potentially in scope for legal deposit. Collection policy for Alexander Turnbull Library extends beyond legal deposit to include websites published in the Pacific and websites published by New Zealanders overseas. A huge task! How do we meet this challenge? A combination of selective harvesting and whole of domain harvesting

5 Selective web harvesting

6 Undertaken by Alexander Turnbull Library Electronic Publications Team 3 full time staff Around 150 websites archived each month -Limiting factors: -People: Limits frequency of harvesting each site -Time: Length of time it takes to harvest a website -Bandwidth: The number of concurrent harvests Prioritisation necessary!

7 Web selection priorities - Considerations: -Collection building – New Zealand Web Archive -Content – historical, cultural and research value -“at risk” sites -Selection areas: - Government websites, e.g. Ministries and Departments - Mäori websites e.g. Treaty, Iwi - Events: e.g. general elections (2002-); local body elections (2007-) - Thematic approach: ethnic/community groups, the arts, music, environment, science, health - websites that reflect our social and political history

8 The process -Each website is assessed for selection -Each website is harvested using Web Curator Tool -Each website is quality reviewed before archiving -Archived websites + associated metadata are preserved in the NDHA where preservation actions can be undertaken in future. ARC format used as preservation standard. -Each website is fully described and accessible via the online catalogue ( http://nlnzcat.natlib.govt.nz/)

9

10

11

12

13

14

15

16

17 Current web archiving limitations - Deep web (databases, websites that require logins; portals) - Flash – affects some music and video capture - Javascript – affects the viewing of some websites (browser issues in IE or Firefox, etc) - 3 rd party hosted sites (legal issues, e.g. YouTube)

18 Domain harvesting

19 Harvesting is scaled widely to the.nz domain level Provides a snapshot of New Zealand’s internet A wider range of websites are collected We crawl more deeply in priority areas

20 Whole of Domain harvest 2008 10 days 106 million URLs 397 thousand hosts 4.5 terabytes of data downloaded Data securely stored on a server at NLNZ Data not catalogued and not accessible to the public

21 Whole of Domain harvest 2008

22

23 Whole of Domain harvest 2010 Second domain harvest planned for first half of 2010 Public sector sites will be a priority Have applied to get the “.nz zone file” from the Domain Name Commissioner to ensure more even coverage

24 The bigger picture How archived web content at NLNZ might assist the web record keeping process...

25 The bigger picture - MOU between Archives New Zealand and NLNZ -Decommissioning a website? Let us know! We might want to harvest the website for legal deposit. -Alert NLNZ to any new website or existing website that hasn’t been archived by NLNZ and we can assess it.

26 Web designers take note! -Websites that adhere to the Government Web Standards increase the likelihood of a successful web harvest. -If you can’t find your way around your website with Javascript turned off, chances are our web harvester can’t either. -Use RSS feeds to alert the public (and us) when new documents are posted to the websites.

27 Further information Got questions? Got a new website? Losing a website?!? Email Gillian Lee: Web.Archive@natlib.govt.nz For more information… The New Zealand Web Archive: http://www.natlib.govt.nz/collections/a-z-of-all-collections/nz-web-archive Legal deposit: http://www.natlib.govt.nz/services/legal-deposit-donations/legal-deposit-intro


Download ppt "The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library."

Similar presentations


Ads by Google