1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive.

Slides:



Advertisements
Similar presentations
OCLC Digital Archive Overview Judith Cobb LIPA Meeting July 2006.
Advertisements

1 What is the Internet Archive We are a Digital Library Mission Statement: Universal access to human knowledge Founded in 1996 by Brewster Kahle in San.
Bibliothèque nationale de France Tallinn, BnF update: production and development priorities in 2015.
Integrated Digital Event Web Archive and Library (IDEAL) and Aid for Curators Archive-It Partner Meeting Montgomery, Alabama Mohamed Farag & Prashant Chandrasekar.
Looking Ahead Archive-It Partner Meeting November 18, 2014.
The Library of Congress Cooperative Web Archiving Project Abbie Grotke, Library of Congress Grant Harris, Library of Congress Jennifer Long, Georgetown.
BUILDING DIGITAL WEB ARCHIVES FOR FUTURE SCHOLARS Jani Stenvall
Looking Ahead Archive-It Partner Meeting November 12, 2013.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive July 2008.
1 Minerva The Web Preservation Project. 2 Team Members Library of Congress Roger Adkins Cassy Ammen Allene Hayes Melissa Levine Diane Kresh Jane Mandelbaum.
Archive-It Architecture Introduction April 18, 2006 Dan Avery Internet Archive 1.
1 Archiving and Preserving the Web Kristine Hanna Internet Archive April 2006.
1 Archive-It Training University of Maryland July 12, 2007.
1 Advanced Archive-It Application Training: Archiving Social Networking and Social Media Sites.
Web Archiving Life Cycle Model Archive-It Partner Meeting December 3, 2012 Molly Bragg
Annick Le Follic Bibliothèque nationale de France Tallinn,
Web Archiving at the Innsbruck Newspaper Archive Innsbrucker Zeitungsarchiv / IZA Presentation by Renate Giacomuzzi, Elisabeth Sporer, Armin Schleicher.
Archive-It collection on “Occupy Movement 2011/2012” Archiving Web Content.
OPEN ACCESS IN CONFLICT WITH COPYRIGHT AND TECHNICAL BARRIERS By Dr. Ta Ba Hung Director, NACESTI, Vietnam 2 nd International IFLA Presidential Meeting.
Digital Library Architecture and Technology
Bibliography in the Digital Age - IFLA Satellite Meeting Warsaw, 9 August Online materials published in Austria collecting, archiving and metadata.
Joanne Archer University of Maryland Kate Odell Archive-It Abbie Grotke Library of Congress Tessa Fallon Columbia University Creating and Maintaining Web.
WebArchiv Czech Web Archive IIPC 2007, Paris.
1 Archiving and Preserving the Web Dan Avery Kristine Hanna Merrilee Proffitt Internet Archive RLG April 2006.
Marty Harris aka TEXT QUERY SYSTEM Marty Harris Mgr TRD.
How to Face the Challenges of Web Archiving? The experiences of a small library on the edge. Chloe Martin, Internet Memory Catherine Ryan, National Library.
Web The Internet Archive. Agenda Brief Introduction to IA Web Archiving Collection Policies and Strategies Key Challenges (opportunities for.
Tool Academy: Web Archiving Nicholas Digital Cultural Heritage DC Meetup December 20, 2012 “cobwebbed screw driver” by Flickr user Colby.
The Web is a Mess: or How I Learned to Stop Worrying and Love Web Archiving Lori Donovan, Internet Archive.
Web Capture team Office of strategic initiatives February 27, 2006 Selecting Content from the Web: Challenges and Experiences of the Library of Congress.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
The Web Archiving Service Tracy Seneca California Digital Library California Digital LibraryNew York UniversityUniversity of North Texas National Digital.
Web Archiving Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Annick Le Follic Bibliothèque nationale de France Tallinn,
IIPC GA Curator Tools Fair May 2014 WEB CURATOR TOOL Nicola Bingham Web Archivist.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
The web has revolutionized our access to information. Documents and publications that were once difficult to fin are now readily available to anyone. Government.
Michalis Vazirgiannis Archiving the Web sites of Athens University of Economics and Business.
CNI Fall Task Force, December 2007 International Internet Preservation Consortium Abbie Grotke IIPC Communications Officer Library of Congress & George.
The ECHO DEPository Project A project of the University of Illinois at Urbana-Champaign and OCLC in partnership with the Library of Congress ALA Annual.
ECHO DEPository Project: Highlight on tools & emerging issues The ECHO DEPository Project is a 3-year digital preservation research and development project.
Plans for 2015 Tallinn, Jan 29 th, 2015 Ditte Laursen, Sabine Schostag,
Caught in the Web: Web Archiving at U of A Libraries Geoff Harder and Kenton Good Digital Preservation Seminar | March 5, 2010 | University of Alberta.
Can we be doing more? Beth Tillinghast University of Hawaii at Manoa October 19, 2011 Archive-It Partner Meeting ACCESS TO OUR ARCHIVED WEBSITE COLLECTIONS.
Preserving Digital Culture: Tools & Strategies for Building Web Archives : Tools and Strategies for Building Web Archives Internet Librarian 2009 Tracy.
CTRnet: A Crisis, Tragedy, & Recovery Network ( Oct.16, 2009 VCOM Research Day Blacksburg, VA USA Edward Fox Bidisha.
Netarkivet RESAW seminar, Dec 2-3, 2013 Day 1. Who are we today □Birgit N. Henriksen, head of digital preservation, KB □Bjarne Andersen, head of digital.
The Library of Congress Martha Anderson Program Officer, NDIIPP Office of Strategic Initiatives Library of Congress April 2005 LC Perspective : Preservation.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
CyberCemetery Preserving At-Risk Government Web Content.
ALA Institutional Repository Update ALA Archives at the University of Illinois Urbana-Champaign Chris Prom Cara Bertram Denise Rayman.
The Story of at the Alaska State Library Presented by Sheri Somerville Alaska State Library March 14, 2009.
Metadata Extraction & Web Archives: Automating the Record Creation Process Abbie Grotke / Gina Jones /
Preservation Program Digital Preservation Program Digital Preservation Services: Extending tools to meet campus needs Patricia Cruse, Director, Digital.
Current Quality Assurance Practices in Web Archiving Brenda Reyes Ayala, Mark Phillips, and Lauren Ko University of North Texas
Search and Access Technologies for Large Scale Web Archives Joseph JaJa, Sangchul Song, and Mike Smorul Institute for Advanced Computer Studies Department.
Al Cornish, Systems Librarian Washington State University Libraries Preserving Access to Multimedia Collections.
Digital Preservation through Cooperation: LOCKSS Gail McMillan Digital Library and Archives, University Libraries Virginia Polytechnic Institute and State.
Use cases for BnF broad crawls Annick Lorthios. 2 Step by step, the first in-house broad crawl The 2010 broad crawl has been performed in-house at the.
Web Archiving Workshop Mark Phillips Texas Conference on Digital Libraries June 4, 2008.
Digitization Workflows From the Digital Projects Unit University of North Texas Libraries Mark E. Phillips Jeremy D. Moore February 12, 2009.
Archiving & Preserving Digital Content
Workshop on Web Archiving
Joanne Archer University of Maryland Libraries
László Drótos – Márton Németh National Széchényi Library Department of Electronic Library Services Web archiving Planning a new pilot project.
DIGITAL LIBRARY.
Latin American Government Documents Archive, LAGDA
Wisconsin County and Municipal Government Collections in Archive-It
Brewster Kahle Director Internet Archive
Presentation transcript:

1 Archive-It: Archiving and Preserving Born Digital Content NDIIPP June 2009 Molly Bragg Partner Specialist Internet Archive

2 About Internet Archive Non profit founded in 1996 by Brewster Kahle Universal access to human knowledge Officially designated a library by the state of California (2007) Built on open source software and dedicated to open source principles Current archive is 150 billion pages Largest publicly accessible web archive:

3 Open Source Technology primarily developed by Internet Archive and IIPC Heritrix: web crawler - crawls and captures pages Wayback Machine: access tool for rendering and viewing pages. Displays archived web pages--surf the web as it was. NutchWAX: Open source search engine. Standard full- text search WARC File: archival file format used for preservation – ISO standard How do we collect it?

4 Web based application that allows users to create, manage and preserve collections of born digital content. Annual subscription service, includes hosting, access and storage Partners do not need significant technical infrastructure or personnel resources Functions include: harvesting, scoping, full text search, cataloging with metadata, reports and analysis of collections Archive-It

5 Archive-It Partners First deployed in January 2006 Current total: 102 partners 39% University and Public Libraries 30% State Archives and Libraries 10% High Schools 10% Non Government Non Profits 5% National Libraries 4% Federal Institutions 2% Museums

6 Access = Use = Funding Various ways to access collections online: –Private web application with login/password –Archive-It public website –Partners website: landing pages with institutions’ layout, look and feel –Restricted and private access options available Access to Born Digital Content

9 What is compelling about archived web content? “At risk” content needs to be preserved before it is lost More primary source information is only available in born-digital format Diverse range of content included in one location (website) Need to document history from multiple perspectives for future generations

10 Archive-It Application

Web App Screen shot

16 How Partners Use Archive-It

17 Stanford University, Islamic and Middle Eastern Collection Purpose: harvest and preserve Iranian Blogs Archiving over 300 blogs written by and for Iran and the Iranian people Also includes coverage of current Iranian elections Partner since February million URLs, 1.4 terabytes of data

20 Virginia Tech University Purpose: capture an event as it unfolds on the web and changes rapidly Quick set-up and archive on demand University sites, news sites, blogs Crisis, Tragedy and Preservation Consortium Northern Illinois University shooting (Feb 08) 5.3 million URLs, 330 gigabytes of data

22 Electronic Literature Organization Purpose: archive born digital literature Poems and stories that are generated by computers, either interactively or based on parameters given at the beginning Collect individual works, collections/journals, and critical opinion Archive-It Partner since July million URLs, 340 gb of data

– 2010 Programs K12 Web Archiving Program 9 schools 2008 – Applications for program begin mid July: Spanish User Interface Global Spanish speaking partners US Hispanic Population

27 Thank you! Molly Bragg Partner Specialist , ext. 6 Kristine Hanna Director, Web Archiving Services m ext. 5