Presentation is loading. Please wait.

Presentation is loading. Please wait.

Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta.

Similar presentations


Presentation on theme: "Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta."— Presentation transcript:

1 Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta

2 Official children’s site of the 2000 Sydney Olympics - MIA: http://www.olympics.com/eng/kids/index.html?/eng/kids/home.html

3 GeoCities: 1995-2009 http://www.pcworld.com/article/163765/so_long_ge ocities_we_forgot_you_still_existed.html

4 Mind the Gap - UK “If websites continue to disappear in the same way as those on President Bush and the Sydney Olympics - perhaps exacerbated by the current economic climate that is killing companies - the memory of the nation disappears too. Historians and citizens of the future will find a black hole in the knowledge base of the 21st century.” Quote: http://www.guardian.co.uk/technology/2009/jan/25/int ernet-heritage

5 “New definitions need to be created for determining the scope of digital special collections, so that stakeholders can understand the nature of special collections professionals’ responsibilities. These include a responsibility for harvesting and preserving endangered web sites, wikis and other dynamic information resources.” Digital Special Collections Special Collections in ARL Libraries – March 2009 A Discussion Report from the ARL Working Group on Special Collections

6 Looking ahead…  234 million – The number of websites as of December 2009.  47 million – Added websites in 2009.  126 million – The number of blogs on the Internet (as tracked by BlogPulse).  27.3 million – Number of tweets on per day (November, 2009)  350 million – People on  4 billion – Photos hosted by (October 2009).  12.2 billion – Videos viewed per month on in the US (November 2009). http://royal.pingdom.com/2010/01/22/internet-2009-in-numbers/

7 Does the web matter? Only if our cultural, historical, political, economic, and social memories matter.  Valuable BUT vulnerable – e.g. foundation losses funding; can only afford digital publishing.  Research and analysis – longitudinal view requires a complete picture.  SOMEONE needs to take responsibility for it.

8 Web Archiving Web Archiving is the process of collecting portions of the World Wide Web and ensuring the collection is preserved in an archive, such as an archive site, for future researchers, historians, and the public. Due to the massive size of the Web, web archivists typically employ web crawlers for automated collection. Wikipedia, “Web Archiving”

9 how web archiving works A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner. A web crawler (ant, bot) is a computer program that browses and harvests (captures, collects) the World Wide Web in a methodical, automated manner.

10 ARCHIVE-IT

11 Web Archive Admin Screen

12 HCF Collection

13 Seed Management

14 Reports

15 Reports

16 File Type Report

17 Blocked Content Robots.txt

18 Web Archive Launch Page

19

20

21

22

23 Exposing Hidden Content

24 U of A Web Archive Partner with Internet Archive on the use of Archive-It Partner with Internet Archive on the use of Archive-It Three targets: (criteria: thematic, regional, event-based, organizational) Three targets: (criteria: thematic, regional, event-based, organizational) 1)Heritage Community Foundation (collection at risk) 2)University of Alberta websites 3) Western Canadian materials (e.g. political websites)

25 A few resources University of Alberta Web Archive: University of Alberta Web Archive: Archive-it! and Wayback Machine Archive-it! and Wayback Machine IIPC – International Internet Preservation Consortium IIPC – International Internet Preservation Consortium Use Cases for Access to Internet Archives, IIPC Access Working Group, Use Cases for Access to Internet Archives, IIPC Access Working Group, Special Collections in ARL Libraries, Report March 2009 Special Collections in ARL Libraries, Report March 2009 GoC Web Archive GoC Web Archive

26 thanks Geoff Harder Digital Initiatives Coordinator geoffrey.harder@ualberta.ca Kenton Good Web Development Librarian kenton.good@ualberta.ca


Download ppt "Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta."

Similar presentations


Ads by Google