Presentation is loading. Please wait.

Presentation is loading. Please wait.

August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

Similar presentations

Presentation on theme: "August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)"— Presentation transcript:

1 August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)

2 August 2005IFLA - CDNL2 Synopsis The IIPC - what is it? Background IIPC goals and organisation IIPC issues IIPC future? Concluding remarks

3 August 2005IFLA - CDNL3 The IIPC - What is it? International collaboration for preserving Internet content Mission: Acquire, preserve and make accessible Internet (WWW) content for future generations 12 participating institutions –National libraries of: Australia, Canada, Denmark, Finland, France, Iceland, Italy, Norway, Sweden. The British Library (UK), The Library of Congress (USA) and the Internet Archive (USA) Chartered in Paris July, 2003, agreement in effect for 3 years Future not decided but IIPC seeks to involve national libraries IIPC welcomes inquiries about future membership

4 August 2005IFLA - CDNL4 Background The Internet is a specific medium with attributes of: –Books, journals, radio, images, video Characterised by –Exponential growth since 1994 –Proliferation –Immense volume –Anybody can publish –Accessible everywhere

5 August 2005IFLA - CDNL5 Archiving the Web – WHY - Who Presently and in the future, a large and significant part of our culture will exist ONLY on the Internet If the Web pages are not collected in an orderly and continuous manner they will disappear and thereby an important part of the worlds cultural and intellectual heritage Therefore we should: Preserve material that is only available on the Web Preserve scholarly data and secure access to it because it is: –Important and valuable –Cited –Finding and locating it is a problem A logic extension of national libraries mission and goals LEGAL DEPOSIT LAW

6 August 2005IFLA - CDNL6 1662 1697 2003 1886 1909-1941 1949 1977 Evolution of Legal deposit Law in Iceland WWW

7 August 2005IFLA - CDNL7 Pre IIPC Development 1996 - Internet Archive, Sweden, Australia 1998 – Nordic co-operation 2000 - 2003 – Loc, BnF, UK, Austria, Slovenia, Check Republic, Lithuania, Canada IFLA 2002: Brewster Kahle presents the IA and Web archiving September 2002 – IA proposes a project with a few libraries September 2002 – Meeting in Rome (during ECDL ) January 2003 – Meeting in Paris (COBRA +) July 2003 IIPC incorporated

8 August 2005IFLA - CDNL8 IIPC Goals To build a virtual global distributed collection to ensure that the distributed and linked nature of the original web material is not lost forever Find a new way of collaborating among national heritage institutions In order to create a network of heritage institutions That can build and preserve the global distributed collection Global information space of the Internet Global Distributed Collection

9 August 2005IFLA - CDNL9 IIPC Organisation Steering group one person from each institution Working groups –Access –Content Management –Deep Web –Framework –Metrics and Testbed –Researchers Requirements

10 August 2005IFLA - CDNL10 IIPC Objectives Collaborative work, within each country's legislative framework, to identify, develop and facilitate implementation of solutions for selecting, collecting, preserving and providing access to internet content Facilitate international coverage of internet content archive collections within national legal frameworks, in accordance with national collection policies International advocacy for initiatives that encourage the collection, preservation and access to internet content Provide a forum for sharing knowledge about internet content archiving both within the Consortium and beyond Develop and recommend standards Develop interoperable tools and techniques to acquire, archive and provide access to web sites Raise awareness of internet preservation issues and initiatives through conferences, workshops, training events and publications.

11 August 2005IFLA - CDNL11 IIPC Results Intangible Common understanding and clarification of issues Definition of the overall architecture for web archiving with system interface specifications Proposed standards for Web Archive file format and Metadata Access requirements with Use cases illustrating common understanding of the functionality of a web archive Identification and requirement specification of new access tools Curator tool for controlling and scheduling the collection of web content Definition of the the WARC (web ARChive) file format to store information blocks harvested by web crawlers

12 August 2005IFLA - CDNL12 IIPC Results Tangible Heritrix Crawler/Harvester –Smart crawling –Continuous harvesting Full Text Indexer/Search Engine –searching/browsing the content of a Web Archive Extract data from an archived database Arc files manipulation tool

13 August 2005IFLA - CDNL13 IIPC Future - Issues Collection building Broad scope representative collection of Web Narrow scope in depth collection of selected sites Registration Cataloguing is not possible Indexing of text (with time element) Access Direct using a URL Search Engine (Google type) Data Mining (Analytical and statistical methods) Long time preservation of a web archive a conscious omission

14 August 2005IFLA - CDNL14 IIPC Future Current IIPC charter ends in July 2006 Proposals for continuation will be discussed at the next meeting in late October 2005 Challenge is to keep the work focused and effective Many unsolved problems and hopefully new members can help

15 August 2005IFLA - CDNL15 Concluding remarks Creating and accessing a Web Arcive is: Very complex, challenging and exiting - not a problem nor a burden Collection – Preservation – Access The first phase has started Our knowledge of the Web and its contents is incomplete All present software and tools must be improved International cooperation needed to: Define and develop standards, techniques and methods Create national and even a global Web Archives Provide access to the archives

16 August 2005IFLA - CDNL16

17 August 2005IFLA - CDNL17 Books/Journals/Sound Rec.Video/Micro/CDsManuscr.Internet INDEX Films National Bibliography reflecting new law Bibliography of National Cultural Heritage Gallery Archive Museum National National Bibliography - from Print to Digital Present National Bibliography

Download ppt "August 2005IFLA - CDNL1 The International Internet Preservation Consortium (IIPC)"

Similar presentations

Ads by Google