Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.

Similar presentations


Presentation on theme: "1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San."— Presentation transcript:

1 1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San Francisco California Officially designated a library by the state of California (2007)

2 2 www.archive-it.org First deployed in February 2006 Web based application that allows users to create, manage and preserve collections of digital web content Functions include: selection and scoping, harvesting, reports and analysis of captures, cataloging with metadata, full text search Archived content includes: text, html, video, audio, images, PDF, online newspapers, social networking and more… Includes hosting, access and storage (primary and back-up) Archived content available for viewing 24 hours after a crawl has completed Archive-It

3 3 Open Source Technology primarily developed by Internet Archive, the open source community, and the IIPC Heritrix: web crawler - crawls and captures pages Wayback Machine: access tool for rendering and viewing pages. Displays archived web pages--surf the web as it was. NutchWAX: Open source search engine. Standard full- text search The Tools Behind Archive-It

4 4 Who Uses Archive-It 130 partners in 42 states and 12 countries 35% University and College Libraries 30% State Archives and Libraries 15% Non Government Non Profits 9% National Libraries/Federal Institutions 7% K-12 Schools 2% Cities and Public Libraries 2% Museums and Art Libraries http://www.archive-it.org/public/partners

5

6 66 Archive-It Web Application

7 77 Why Archive Social Networking Sites? State Agencies & Officials: An increasing number have decided that the content on these sites is a record and needs to be archived. University libraries: Used to share information with students and alumni, and contain important records about a school's culture, student body and campus events. Researchers: Used to preserve valuable social reactions and change on topics of interest Currently about 20 Archive-It partners are archiving content from these sites

8 8 North Carolina State Archives & State Library of North Carolina Purpose: archive state agency websites and publications Includes pages in a variety of formats: text, images, audio, video and social networking sites Archive-It Partner since 2005 (pilot partner)

9 9 North Carolina State Archives & State Library of North Carolina

10 10 North Carolina State Archives & State Library of North Carolina

11 11 Library of Virginia Purpose: Preserve websites relating to Virginia government and elections Collection on current Governor includes Twitter and Flickr sites Collection on Twitter, Flickr, and Facebook sites of politicians and political organizations in Virginia

12 12

13 13

14 14 Stanford University, Islamic and Middle Eastern Collection Purpose: Harvest and preserve Iranian Blogs Archiving over 300 blogs written by and for Iran and the Iranian people Archiving sites from Twitter, Facebook, and Youtube selected by the collection’s curators Partner since February 2008 funded by Library of Congress

15

16 16

17 17 University of Texas, San Antonio Purpose: Archive university websites, student organizations, academic departments, and other local topics important to their university Archiving blogs, Facebook, Twitter, Flickr, MySpace Partner since 2008

18 18

19 19

20 20 Typical Challenges Content behind log-ins can not be archived Content can be blocked by robots.txt files (which our crawlers respect by default) Some parts of sites are not “archive-friendly” (i.e. complex javascript, Flash, etc.) These sites tend to change both their technical structure and policy quickly and often. Structure of the sites/urls means users need to add scoping rules to only capture content you are interested in. Each site has its own unique set of challenges.

21 21 Overall Approaches Trial and Error: Try to harvest with a variety of settings Quality Review: review archived content thoroughly Collaborate: compare approaches and results with other Archive-It users Document detailed instructions, lessons learned, and best practices for other partners

22 22 Thank you! www.archive-it.org http://www.facebook.com/ArchiveIt Kate Odell Partner Specialist, Internet Archive kate@archive.org


Download ppt "1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San."

Similar presentations


Ads by Google