Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia.

Similar presentations


Presentation on theme: "Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia."— Presentation transcript:

1 Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia

2 “The Web's ever-expanding size, the dynamic and ephemeral nature of its content, and how this is to be captured, stored and made accessible for the long-term are some of the key questions being addressed by electronic archiving programs. “ PADI http://www.nla.gov.au/padi/topics/92.html

3 What is web archiving?  A web archive is not the same as the live web  Brings a different value to web content  Creating artefacts from the web  Preserved snapshots, slices, gobbets of time  Challenge of timeliness  At certain times some things are more interesting and valuable  Focus on the future and long term access (preservation objective)

4 History: web archiving at the NLA  April Fools Day 1996: ‘Electronic Unit’ established  May 1998: public access to PANDORA titles  July 1998: first PANDORA ‘partner’ began participation  10 th participant joined in 2003  June 2001: PANDAS v.1 released  Web archiving workflow system developed by NLA  2002: Digital Archiving Branch  Our own identity at last!  Began first trial of ‘mainstreaming’ web archiving in Serials and Govt Deposit sections

5 History: web archiving at the NLA  August 2002: PANDAS v.2 released  July 2003: joined IIPC  2004: PANDORA added to UNESCO Australian Memory of the World Register  July 2005: first.au domain harvest  Subsequent harvests in 2006, 2007, 2008 & 2009  December 2006: “Web Archiving and Digital Preservation Branch”  July 2007: PANDAS v.3 released (at last!)  2010: PANDORA search moved to Trove  May 2010: Proposal for whole-of-govt ‘opt-out’ arrangements through SIGB

6 PANDORA Participants

7 7 What we collect  Selective approach  Collaboration with PANDORA participating agencies  Modest in size  High quality, timely, high value collection, described and searchable  Accessible to the public

8  Subjects  Browse list  Collections  Agency based  Trove – Archived Websites  Trove – bibliographic  Search engines Searching the collections

9 Collections National EventsIraq war, 2003 Asia Tsunami, 2004 Bali Bombing, 2002 Political EventsElections CHOGM National Apology Topic BasedExtreme sports Seven Network Natural eventsFloods Cyclones Bushfires

10 Subjects/Browsing  When looking for non-specific resources  Wish to browse a topic area

11 Agency based  Use the partners page http://pandora.nla.gov.au/partners.html

12

13

14 Collections

15 Election campaigns

16 1996 Federal Election2001 Federal Election2004 Federal Election2007 Federal Election2010 Federal Election1998 Federal Election

17 The Future Selective Timely Small High Value Bulk Harvest Collections Thematic Domain Harvests Comprehensive

18

19 19 Australian web domain harvests  Annual domain harvests 2005-2009  Working with the Internet Archive  Covers.au top level domain and a bit more …  No public access  Quantity over quality; content not assessed or described; opportunistic rather than timely

20 20 Comparative statistics PANDORA Files: 115 million Size:5.03 TB Domain Harvest 20052006200720082009 Unique files 185 million596 million516 million1 billion765 million Hosts crawled 811,5231,046,0381,247,6143,038,6581,074,645 Size TBs6.6919.0418.4734.5524.29 Domain Harvests Files:3 billion Size:103 TB

21 Current status

22

23 23


Download ppt "Web Archiving at the National Library of Australia Russell Latham Senior Web Archivist, National Library of Australia."

Similar presentations


Ads by Google