Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.

Slides:



Advertisements
Similar presentations
K-12 Web Archiving Project Archive-It Partner Meeting November 4, 2009.
Advertisements

Recent developments in digital archiving and preservation Jan Fullerton Director General National Library of Australia.
Update for CDNL Milan 26 August 2009 Caroline Brazier, Chair of ICADS IFLA-CDNL Alliance for Digital Strategies.
A survey of Web preservation initiatives Michael Day UKOLN, University of Bath 7 th European Conference on Research and Advanced Technology.
Digital preservation – State of the game on the library lawns Digital Futures International Forum National Archives of Australia, 19 September 2007 Colin.
DIGITAL HUMANITIES SUMMER SCHOOL 2011 DIGITAL LIBRARY TECHNOLOGIES AND BEST PRACTICE, PART 1: DECONSTRUCTING DIGITAL LIBRARIES Christine Madsen R&D Project.
1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Digital & Preservation Resources Managing the digital collection life cycle.
8 August 2001ALIA Untangling the Web 8/8/ Chris Taylor The University of Queensland Library Gateways: A cottage industry going places?
Latin American and Human Rights Web Archiving as part of Research Library Special Collections Kent Norsworthy LLILAS Benson Digital Curation Coordinator,
1 Archiving and Preserving the Web Kristine Hanna Internet Archive July 2008.
PANDORA and Beyond: Managing Web Archiving at the National Library of Australia Digital Preservation Seminar National Library of Australia, 21 November.
Separating the wheat from the chaff: Identifying key elements in the NLA.au domain harvest Preservation for Ongoing Accessibility: research group Professor.
APSR Forum on Long-Term Repositories National Library of Australia, 31 August – 1 September, Trust and the Web: Can the audit criteria apply to.
Access to Digital Materials through the Library of Congress OPAC Presentation by Dr. Barbara B. Tillett Chief, Cataloging Policy and Support Office Library.
Web archiving at the NLA ‘ Archiving the music web’ Music Council of Australia Annual Assembly 28 September 2009 Paul Koerbin Manager Digital Archiving.
Developing PANDORA Mark Corbould Director, IT Business Systems.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
The capture and preservation of websites at the National Library of New Zealand Gillian Lee Alexander Turnbull Library.
1 Archive-It Training University of Maryland July 12, 2007.
CERES AND COLORADO STATE UNIVERSITY LIBRARIES. PROJECT CERES Begun in 2013, Project CERES is a Center for Research Libraries Global Resources Agriculture.
Promoting Digital Preservation Partnerships at the U.S. Library of Congress April 2004.
The National Digital Newspaper Program (NDNP) An NEH/LC Collaborative Program Enhancing access to historical newspapers Release: September 2006.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Australian web domain harvests 2005, 2006 & 2007.
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
The Australian Government Web Archive ALIA Conference September 2014, Melbourne Alison Dellit Director, Australian Collection Management.
HathiTrust Digital Library. Overview ›Began in 2008 ›Large scale digital preservation repository ›Partnership of major research libraries ›Focus on both.
Annick Le Follic Bibliothèque nationale de France Tallinn,
Web Archiving at the National Library of Australia National Library of Indonesia Staff 5 October 2010 Paul Koerbin Manager, Web Archiving National Library.
Community Memory for Monterey County Presented by Jennifer Smith Branch Manager Monterey County Free Libraries.
Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta.
Office of Strategic Initiatives All Hands Meeting-March 2010 Challenges in Web Archiving: Library of Congress Edition Abbie Grotke, Web Archiving Team.
The Legislative Library of Ontario’s Ontario Documents Repository Road to Partnership.
1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.
From here to perpetuity: challenges (and a few confessions) in preserving web-based AV content ASRA Conference 2011 Paul Koerbin Manager Web Archiving.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
Building the Mother of all Collections: the future of the National Library’s discovery services Warwick Cathro Assistant Director-General, Innovation National.
NetarchiveSuite Meeting, BnF, Austria Updates and Plans for 2012 Michaela Mayr, Andreas P. Austrian National Library
Introducing Intute: Social Sciences Your Guide to the Best of the Web.
1 Collection Development and Web Publications at the British Library John Tuck Head of British Collections Digital Memory, Session 2, Tallinn 24 th November.
Selection Strategies for Digital Institutional Repositories Kent Woynowski 30 September 2004.
GPO’s Federal Digital System December 10, 2009 U.S. Government Printing Office.
1 Strategic Developments at the British Library Lynne Brindley, Chief Executive UK Serials Group, 7 April 2003.
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
1 BCS, Oxfordshire, 19 February, 2004 WEB ARCHIVING issues and challenges Deborah Woodyard Digital Preservation Coordinator.
Warwick Cathro Assistant Director-General Resource Sharing and Innovation National Library of Australia Trove – a service built on collaboration OCLC Asia.
I.S 512 TOPIC 2 : THE INTERNET AND THE WORLD WIDE WEB.
1 NetarchiveSuite Workshop Paris November , 2011.
Rose Holley: Trove Manager Resource Sharing and Innovation National Library of Australia ALIA Conference, Brisbane 1-3 September 2010 Trove: More than.
Archon: Facilitating Access to Special Collections Prepared for PACSCL Conference Something New for Something Old: Innovative Approaches.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
2008 DOT GOV HARVEST PRESERVING ACCESS UNIVERSITY OF NORTH TEXAS LIBRARIES Cathy N. Hartman Mark E. Phillips FDLC Oct 21, 2008.
Archiving & Preserving Digital Content
Joanne Archer University of Maryland Libraries
Challenges and Opportunities of Archiving the UK Web
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Wisconsin County and Municipal Government Collections in Archive-It
The Australian Government Web Archive
Archiving the 2013 Australian Federal Election Russell Latham
MSC photo:  It was taken some time in the late 1930s, but we don’t have an exact date.  The college was known as MSC from 1925 until 1955 when we became.
Presentation transcript:

Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia

“The Web's ever-expanding size, the dynamic and ephemeral nature of its content, and how this is to be captured, stored and made accessible for the long-term are some of the key questions being addressed by electronic archiving programs. “ PADI

What is web archiving?  A web archive is not the same as the live web  Brings a different value to web content  Creating artefacts from the web  Preserved snapshots, slices, gobbets of time  Challenge of timeliness  At certain times some things are more interesting and valuable  Focus on the future and long term access (preservation objective)

History: web archiving at the NLA  April Fools Day 1996: ‘Electronic Unit’ established  May 1998: public access to PANDORA titles  July 1998: first PANDORA ‘partner’ began participation  10 th participant joined in 2003  June 2001: PANDAS v.1 released  Web archiving workflow system developed by NLA  2002: Digital Archiving Branch  Our own identity at last!  Began first trial of ‘mainstreaming’ web archiving in Serials and Govt Deposit sections

History: web archiving at the NLA  August 2002: PANDAS v.2 released  July 2003: joined IIPC  2004: PANDORA added to UNESCO Australian Memory of the World Register  July 2005: first.au domain harvest  Subsequent harvests in 2006, 2007, 2008 & 2009  December 2006: “Web Archiving and Digital Preservation Branch”  July 2007: PANDAS v.3 released (at last!)  2010: PANDORA search moved to Trove  May 2010: Proposal for whole-of-govt ‘opt-out’ arrangements through SIGB

PANDORA Participants

7 What we collect  Selective approach  Collaboration with PANDORA participating agencies  Modest in size  High quality, timely, high value collection, described and searchable  Accessible to the public

 Subjects  Browse list  Collections  Agency based  Trove – Archived Websites  Trove – bibliographic  Search engines Searching the collections

Collections National EventsIraq war, 2003 Asia Tsunami, 2004 Bali Bombing, 2002 Political EventsElections CHOGM National Apology Topic BasedExtreme sports Seven Network Natural eventsFloods Cyclones Bushfires

Subjects/Browsing  When looking for non-specific resources  Wish to browse a topic area

Agency based  Use the partners page

Collections

Election campaigns

1996 Federal Election2001 Federal Election2004 Federal Election2007 Federal Election2010 Federal Election1998 Federal Election

The Future Selective Timely Small High Value Bulk Harvest Collections Thematic Domain Harvests Comprehensive

19 Australian web domain harvests  Annual domain harvests  Working with the Internet Archive  Covers.au top level domain and a bit more …  No public access  Quantity over quality; content not assessed or described; opportunistic rather than timely

20 Comparative statistics PANDORA Files: 115 million Size:5.03 TB Domain Harvest Unique files 185 million596 million516 million1 billion765 million Hosts crawled 811,5231,046,0381,247,6143,038,6581,074,645 Size TBs Domain Harvests Files:3 billion Size:103 TB

Current status

23