Download presentation
Presentation is loading. Please wait.
Published byPhilomena Francine Garrison Modified over 9 years ago
1
Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia
2
“The Web's ever-expanding size, the dynamic and ephemeral nature of its content, and how this is to be captured, stored and made accessible for the long-term are some of the key questions being addressed by electronic archiving programs. “ PADI http://www.nla.gov.au/padi/topics/92.html
3
What is web archiving? A web archive is not the same as the live web Brings a different value to web content Creating artefacts from the web Preserved snapshots, slices, gobbets of time Challenge of timeliness At certain times some things are more interesting and valuable Focus on the future and long term access (preservation objective)
4
History: web archiving at the NLA April Fools Day 1996: ‘Electronic Unit’ established May 1998: public access to PANDORA titles July 1998: first PANDORA ‘partner’ began participation 10 th participant joined in 2003 June 2001: PANDAS v.1 released Web archiving workflow system developed by NLA 2002: Digital Archiving Branch Our own identity at last! Began first trial of ‘mainstreaming’ web archiving in Serials and Govt Deposit sections
5
History: web archiving at the NLA August 2002: PANDAS v.2 released July 2003: joined IIPC 2004: PANDORA added to UNESCO Australian Memory of the World Register July 2005: first.au domain harvest Subsequent harvests in 2006, 2007, 2008 & 2009 December 2006: “Web Archiving and Digital Preservation Branch” July 2007: PANDAS v.3 released (at last!) 2010: PANDORA search moved to Trove May 2010: Proposal for whole-of-govt ‘opt-out’ arrangements through SIGB
6
PANDORA Participants
7
7 What we collect Selective approach Collaboration with PANDORA participating agencies Modest in size High quality, timely, high value collection, described and searchable Accessible to the public
8
Subjects Browse list Collections Agency based Trove – Archived Websites Trove – bibliographic Search engines Searching the collections
9
Collections National EventsIraq war, 2003 Asia Tsunami, 2004 Bali Bombing, 2002 Political EventsElections CHOGM National Apology Topic BasedExtreme sports Seven Network Natural eventsFloods Cyclones Bushfires
10
Subjects/Browsing When looking for non-specific resources Wish to browse a topic area
11
Agency based Use the partners page http://pandora.nla.gov.au/partners.html
14
Collections
15
Election campaigns
16
1996 Federal Election2001 Federal Election2004 Federal Election2007 Federal Election2010 Federal Election1998 Federal Election
17
The Future Selective Timely Small High Value Bulk Harvest Collections Thematic Domain Harvests Comprehensive
19
19 Australian web domain harvests Annual domain harvests 2005-2009 Working with the Internet Archive Covers.au top level domain and a bit more … No public access Quantity over quality; content not assessed or described; opportunistic rather than timely
20
20 Comparative statistics PANDORA Files: 115 million Size:5.03 TB Domain Harvest 20052006200720082009 Unique files 185 million596 million516 million1 billion765 million Hosts crawled 811,5231,046,0381,247,6143,038,6581,074,645 Size TBs6.6919.0418.4734.5524.29 Domain Harvests Files:3 billion Size:103 TB
21
Current status
23
23
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.