The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.

Slides:



Advertisements
Similar presentations
National Library of New Zealand : strategies for interoperability: metadata projects and activities Karen Rollitt Douglas Campbell DCMI Localisation and.
Advertisements

1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Sandra McIntyre Program Director. OVERVIEW Analysis.
Geospatial One-Stop A Federal Gateway to Federal, State & Local Geographic Data
14 mai 2007Evolution of Scientific Publications, Colloque de l'Académie des sciences1 Preservation of electronic publications mission Catherine Lupovici.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
A Single Entrance for Access to Cultural Data (Archives, Museums, Libraries, Heritage) at the French Ministry of Culture Knowledge.
School of something FACULTY OF OTHER University Library The Library’s Digital Repository or Whatever happened to MIDESS? Michael Emly Jonathan Ainsworth.
The FDLP Web Archive Dory Bower Archive-It Partner Meeting November 18, 2014.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
Social Media and Recordkeeping Allegra Huxtable Manager Government Recordkeeping Tasmanian Archive and Heritage Office.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
1 The Vietnam Center and Archive Stephen Maxner, Ph.D.
Elizabeth Newbold and Samantha Tillett GL8 New Orleans, December 2006
1 WEB ARCHIVING IN THE BRITISH LIBRARY John Tuck Head of British Collections February 2004.
Preserving webharvests at the National Library of New Zealand Te Puna Mātauranga o Aotearoa Peter McKinney Digital Preservation Policy Analyst National.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
Introduction to EndNote Web Margaret Forrest Academic Liaison Librarian.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Managing your web records Patrick Power Manager, Government Recordkeeping Programme Archives New Zealand.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Selecting journals for digitisation Piecing together the puzzle to create a European model Dr Hazel Woodward Cranfield University, UK
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
City of Seattle Office of the City Clerk Open Government = Access Challenges and Opportunities with Digital Records.
Social Science Data and ETDs: Issues and Challenges Joan Cheverie Georgetown University Myron Gutmann ICPSR – University of Michigan Austin McLean ProQuest.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
CERN – IT Department CH-1211 Genève 23 Switzerland t CERN Open Source Collaborative tools: Digital Library Software Tim Smith CERN/IT.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Web Archiving at the National Library of Australia National Library of Indonesia Staff 5 October 2010 Paul Koerbin Manager, Web Archiving National Library.
From Concept to Reality: An overview of the University of Wisconsin Digital Collections Melissa Mclimans.
Preservation – Why the Urgency? “A National Library is a place where a nation nourishes its memory and exerts its imagination – where it connects with.
Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.
Elements of a Data Management Plan: Roles and Responsibilities Ruth Duerr National Snow and Ice Data Center Version 1.0 Review Date.
European Commission on Preservation and Access Preservation of digital heritage Yola de Lusenet Lisbon, November
Digital Commons & Open Access Repositories Johanna Bristow, Strategic Marketing Manager APBSLG Libraries: September 2006.
Managing your web records? Patrick Power Manager Government Recordkeeping Programme.
Selection Strategies for Digital Institutional Repositories Kent Woynowski 30 September 2004.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
 A website, also written Web site, web site, or simply site, is a group of Web pages and related text, databases, graphics, audio, and video files that.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
Building Collections on the Web BCWeb. What’s BCWeb ? BCWeb was developped entirely by the BnF for the content curators to replace its old selection tools.
New Opportunities Fund Preservation Workshop March 15th 2002 Maggie Jones Cedars Project Manager.
Cloudy culture cloud-like services to improve the preservation of digital cultural heritage Lee Hibberd National Library of Scotland
Strategies for archiving the Danish web space Bjarne Andersen Head of Digital Resources State and University Library, Aarhus
Using Electronic Resources to enhance teaching & learning Wendy Abbott Associate Director, Customer Services With Peta Hopkins Information Systems Librarian.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Archiving & Preserving Digital Content
RECORDS MANAGEMENT Judith Read and Mary Lea Ginn
Finnish web-archive and digital legal deposit copies
Joanne Archer University of Maryland Libraries
Welcome To MusicBrainz
Challenges and Opportunities of Archiving the UK Web
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
Legal Deposit & UK Publishing
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Using Electronic Resources to enhance teaching & learning
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Panel on Web Archiving Government Information: LAC’s Program Update
NSLA Digital Collecting Project - Scope
Presentation transcript:

The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library

Overview National Library Act Legal deposit extension to include electronic publications Selective web harvesting (Alexander Turnbull Library) Whole of Domain harvest (National Library of New Zealand) How archived web content at NLNZ might assist the web record keeping process

National Library of New Zealand (Te Puna Mātauranga o Aotearoa) Act 2003: – requires the Library to collect, preserve, protect and provide access to New Zealand’s documentary heritage. – legal deposit extended to include internet documents from August 2006 onwards. This includes websites. – Publishers’ responsibility: legally required to assist the National Library to make a copy of their internet document upon request. – Publishers are encouraged to deposit internet documents such as online books and serials with NLNZ, but not websites. NLNZ uses a web harvester instead. Legal Deposit

Context 377,341. NZ domain names currently registered. All potentially in scope for legal deposit. Collection policy for Alexander Turnbull Library extends beyond legal deposit to include websites published in the Pacific and websites published by New Zealanders overseas. A huge task! How do we meet this challenge? A combination of selective harvesting and whole of domain harvesting

Selective web harvesting

Undertaken by Alexander Turnbull Library Electronic Publications Team 3 full time staff Around 150 websites archived each month -Limiting factors: -People: Limits frequency of harvesting each site -Time: Length of time it takes to harvest a website -Bandwidth: The number of concurrent harvests Prioritisation necessary!

Web selection priorities - Considerations: -Collection building – New Zealand Web Archive -Content – historical, cultural and research value -“at risk” sites -Selection areas: - Government websites, e.g. Ministries and Departments - Mäori websites e.g. Treaty, Iwi - Events: e.g. general elections (2002-); local body elections (2007-) - Thematic approach: ethnic/community groups, the arts, music, environment, science, health - websites that reflect our social and political history

The process -Each website is assessed for selection -Each website is harvested using Web Curator Tool -Each website is quality reviewed before archiving -Archived websites + associated metadata are preserved in the NDHA where preservation actions can be undertaken in future. ARC format used as preservation standard. -Each website is fully described and accessible via the online catalogue (

Current web archiving limitations - Deep web (databases, websites that require logins; portals) - Flash – affects some music and video capture - Javascript – affects the viewing of some websites (browser issues in IE or Firefox, etc) - 3 rd party hosted sites (legal issues, e.g. YouTube)

Domain harvesting

Harvesting is scaled widely to the.nz domain level Provides a snapshot of New Zealand’s internet A wider range of websites are collected We crawl more deeply in priority areas

Whole of Domain harvest days 106 million URLs 397 thousand hosts 4.5 terabytes of data downloaded Data securely stored on a server at NLNZ Data not catalogued and not accessible to the public

Whole of Domain harvest 2008

Whole of Domain harvest 2010 Second domain harvest planned for first half of 2010 Public sector sites will be a priority Have applied to get the “.nz zone file” from the Domain Name Commissioner to ensure more even coverage

The bigger picture How archived web content at NLNZ might assist the web record keeping process...

The bigger picture - MOU between Archives New Zealand and NLNZ -Decommissioning a website? Let us know! We might want to harvest the website for legal deposit. -Alert NLNZ to any new website or existing website that hasn’t been archived by NLNZ and we can assess it.

Web designers take note! -Websites that adhere to the Government Web Standards increase the likelihood of a successful web harvest. -If you can’t find your way around your website with Javascript turned off, chances are our web harvester can’t either. -Use RSS feeds to alert the public (and us) when new documents are posted to the websites.

Further information Got questions? Got a new website? Losing a website?!? Gillian Lee: For more information… The New Zealand Web Archive: Legal deposit: