Presentation is loading. Please wait.

Presentation is loading. Please wait.

Memento Update CNI Task Force Meeting, Spring 2011 1 Memento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps.

Similar presentations


Presentation on theme: "Memento Update CNI Task Force Meeting, Spring 2011 1 Memento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps."— Presentation transcript:

1 Memento Update CNI Task Force Meeting, Spring 2011 1 Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps Towards Seamless Navigation of the Web of the Past

2 Memento Update CNI Task Force Meeting, Spring 2011 2 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

3 Memento Update CNI Task Force Meeting, Spring 2011 3 Overview of Memento Framework Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

4 Memento Update CNI Task Force Meeting, Spring 2011 4 Memento wants to make it easy to access the Web of the Past.

5 Memento Update CNI Task Force Meeting, Spring 2011 5 Tate Online Today Select Date March 16 2008 Tate Online March 16 2008 From National Archives

6 Memento Update CNI Task Force Meeting, Spring 2011 6 Tate Online Today Select Date March 16 2008 Tate Online March 16 2008 From National Archives DynamicStatic

7 Memento Update CNI Task Force Meeting, Spring 2011 7 Memento achieves this by introducing a uniform version access capability to integrate the present and past Web.

8 Memento Update CNI Task Force Meeting, Spring 2011 8 Content Management Systems: Designed to be aware of all versions of a resource Self-contained Variety of proprietary version mechanisms Versions interlinked using proprietary mechanisms Dynamism is managed

9 Memento Update CNI Task Force Meeting, Spring 2011 9 World Wide Web: Designed to forget about prior versions of a resource Distributed Dynamism from a management perspective is ignored

10 Memento Update CNI Task Force Meeting, Spring 2011 10 There are resource versions on the Web: Content management systems Web archives Transactional archives Search engine caches

11 Memento Update CNI Task Force Meeting, Spring 2011 11 But the Web architecture has a hard time dealing with them: Cannot talk about a resource as it used to exist Cannot access a prior version knowing the current one Cannot access the current version knowing a prior one Current approaches are ad hoc and localized

12 Memento Update CNI Task Force Meeting, Spring 2011 12 Memento: Regards the Web as a big Content Management System Introduces a uniform capability to access versions on the Web Does not build new archives but leverages all systems that host versions: Web archives, Content Management Systems, Software Version Systems, etc.

13 Memento Update CNI Task Force Meeting, Spring 2011 13 Memento’s version access approach: Is distributed: versions may exist on several servers Uses time as a global version indicator Is based on the primitives of the Web: resource, resource state, representation, content negotiation, link

14 Memento Update CNI Task Force Meeting, Spring 2011 14 Original Resource and Versions

15 Memento Update CNI Task Force Meeting, Spring 2011 15 Bridge from Present to Past

16 Memento Update CNI Task Force Meeting, Spring 2011 16 Bridge from Past to Present

17 Memento Update CNI Task Force Meeting, Spring 2011 17 Memento Framework

18 Memento Update CNI Task Force Meeting, Spring 2011 18 Multiple Archives

19 Memento Update CNI Task Force Meeting, Spring 2011 19 Memento Client-Server Interaction

20 Memento Update CNI Task Force Meeting, Spring 2011 20 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

21 Memento Update CNI Task Force Meeting, Spring 2011 21 Significant progress has been made towards seamless navigation of the Web of the Past.

22 Memento Update CNI Task Force Meeting, Spring 2011 22 Standardization Standardization process started via the IETF Interest from IETF and W3C Encouraged by major Web architects, including: Tim Berners-Lee, Mark Nottingham, Michael Hausenblas https://datatracker.ietf.org/doc/draft-vandesompel-memento/

23 Memento Update CNI Task Force Meeting, Spring 2011 23 Memento Clients Several client tools developed by us and others Add-ons for FireFox (operational) and Internet Explorer (experimental) Applications for Android (operational) and iPhone/iPad (in development) Paper in next issue of Code4Lib Journal http://www.mementoweb.org/tools/

24 Memento Update CNI Task Force Meeting, Spring 2011 24 Memento Server Support (1) Memento-compliant Wayback software: Used by Internet Archive Available to Web archives, worldwide Please have your favorite Web Archive install this new version 1.6! http://www.mementoweb.org/tools/

25 Memento Update CNI Task Force Meeting, Spring 2011 25 Memento Server Support (2) Plug-in for MediaWiki (operational) Used on W3C’s main wiki Please install it for your MediaWiki! http://www.mementoweb.org/tools/

26 Memento Update CNI Task Force Meeting, Spring 2011 26 Memento Server Validator Server side client: Attempts to perform all Memento actions against a given URI Reports success/failure of the interactions and warnings for optional aspects Kept up to date with IETF Internet Draft http://www.mementoweb.org/tools/

27 Memento Update CNI Task Force Meeting, Spring 2011 27 Memento Proxy Support Several systems that host Mementos made Memento- compliant “by proxy”: All major Web Archives that do not yet run Memento- compliant Wayback software 3,000+ MediaWiki systems, including Wikipedia We want all of these to become natively Memento compliant!

28 Memento Update CNI Task Force Meeting, Spring 2011 28 Memento Website Ongoing effort to add materials that support understanding and adoption: Introduction to Memento How to recognize Mementos, TimeGates, Original Resources? Guidelines for servers that host Mementos (Web Archives, CMS, snapshot archives, etc.) http://www.mementoweb.org/guide/

29 Memento Update CNI Task Force Meeting, Spring 2011 29 Funding 2007-2010: US $250K grant from Library of Congress Approx. 50K on Memento 2010-2011: US $1 Million follow-up grant from Library of Congress For: Specification, outreach, tool development, further research

30 Memento Update CNI Task Force Meeting, Spring 2011 30 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

31 Memento Update CNI Task Force Meeting, Spring 2011 31 Memento Time Travel is really powerful. Time-Series Data via HTTP follow-your-nose.

32 Memento Update CNI Task Force Meeting, Spring 2011 32 Memento Framework

33 Memento Update CNI Task Force Meeting, Spring 2011 33 Original Resource: http://lanlsource.lanl.gov/pics/picoftheday.png Time Series for Humans

34 Memento Update CNI Task Force Meeting, Spring 2011 34 Data collected through HTTP Navigation Time Travel across versions of a Picture of the Day

35 Memento Update CNI Task Force Meeting, Spring 2011 35 Thanks Christine! time change Data time Process time Reproducibility But if we had static, discoverable snapshots of the data and the process…

36 Memento Update CNI Task Force Meeting, Spring 2011 36 Original Resource: http://dbpedia.org/resource/France Time Series for Machines

37 Memento Update CNI Task Force Meeting, Spring 2011 37 Data collected through HTTP Navigation paper at http://arxiv.org/abs/1003.3661 Time Travel across versions of DBPedia

38 Memento Update CNI Task Force Meeting, Spring 2011 38 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

39 Memento Update CNI Task Force Meeting, Spring 2011 39 Very few Web sites provide a “timegate” link. Need additional mechanisms to support Discovery.

40 Memento Update CNI Task Force Meeting, Spring 2011 40 Batch discovery of Mementos: TimeMaps A TimeMap minimally lists: URI and datetime of Mementos known to an archive URI of Original Resource TimeMaps can be aggregated across systems that host Mementos

41 Memento Update CNI Task Force Meeting, Spring 2011 41 Batch discovery of Mementos: Feed of TimeMaps System that host Mementos exposes Feed (e.g. Atom) of TimeMaps to allow applications to remain in sync with its evolving Memento collection: One Atom entry per Original Resource for which system hosts Mementos The entry provides a “timemap” link to a TimeMap for the Original Resource The datetime value of the updated field of the entry changes when additional Memento for Original Resource becomes available (i.e. TimeMap changes) The ID of the entry is a tag URI based on URI of Original Resource Will be proposed to IIPC

42 Memento Update CNI Task Force Meeting, Spring 2011 42 Batch discovery of Mementos: robots.txt robots.txt file is used by Web servers to convey crawling policies Add a directive to support discovery of Mementos known to the server: Pointer to a single Memento can suffice as the robot can crawl on from there Mementos allow for discovery of TimeMaps via HTTP links e.g. jcdl.org hosts snapshot archives of prior JCDL conferences and adds the following to its robots.txt Memento: jcdl.org/archive/2002/index.html Will be promoted via Internet Draft

43 Memento Update CNI Task Force Meeting, Spring 2011 43 Batch discovery of TimeGates: robots.txt robots.txt file is used by Web servers to convey crawling policies; Add a directive to support discovery of TimeGates known to the server: TimeGates can be on server itself or on external server Value for the directive is typcially a regular expression e.g example.org could point at TimeGates in its associated transactional ta.org via robots.txt: TimeGate: ta.org/timegate/http://example.org/* Will be promoted via Internet Draft

44 Memento Update CNI Task Force Meeting, Spring 2011 44 Discovery of Systems that Host Mementos: Registry/Feed Registry of collections of Mementos, e.g. of Web Archives, Transactional Archives, etc. Feed of registry records. A registry record details essential characteristics of a Memento collection. cf VOiD collection description for Linked Data. Will be researched

45 Memento Update CNI Task Force Meeting, Spring 2011 45 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

46 Memento Update CNI Task Force Meeting, Spring 2011 46 Memento can recreate pages using resources from different archives. This poses a branding challenge.

47 Memento Update CNI Task Force Meeting, Spring 2011 47 Current Branding Practice for Web Archives Page and embedded resources from same Web Archive Branding for page and embedded resources

48 Memento Update CNI Task Force Meeting, Spring 2011 48 Branding for Web Archives in Memento Mode Will be researched Page and embedded resources from various Web Archives Page branding No branding No branding

49 Memento Update CNI Task Force Meeting, Spring 2011 49 Overview of Memento Framework Deployment Progress Memento and Data Memento and Discovery Memento and Branding Alternative Web Archiving Strategies

50 Memento Update CNI Task Force Meeting, Spring 2011 50 Crawl-based Archives host distinct observations. Transactional Archives never miss an update.

51 Memento Update CNI Task Force Meeting, Spring 2011 51 Crawl-Based Web Archives Observations For example: Heritrix crawler for Internet Archive

52 Memento Update CNI Task Force Meeting, Spring 2011 52 Collect discreet observations of resources, not their entire evolution. Can be rejected (robots.txt, by user-agent, by host IP) Can be deceived (cloaking, by geo-location, by user- agent). Coverage of particular Web server dependent on crawl- strategy. Crawl-Based Web Archives

53 Memento Update CNI Task Force Meeting, Spring 2011 53 Server-Side Transactional Web Archives Change History For example: TTApache, PageVault, Vignette Web Capture

54 Memento Update CNI Task Force Meeting, Spring 2011 54 Collect all representations served by to-be-archived server. To-be-archived server needs to cooperate. Incentives e.g. institutional memory, official record of Web presence. Archival coverage restricted by to-be-archived server, does not include external servers (e.g. embedded resources). To be archived server can submit falsified information. Archival collection management: what to keep, what not (e.g. significant changes, deduplication, …). Server-Side Transactional Web Archives

55 Memento Update CNI Task Force Meeting, Spring 2011 55 Development of Transactional Web Archive Software Submit: Java-Grizzly-Jersey submission interface application Berkeley DB metadata store FS store for body and headers Capture: Apache connection filter module (mod_ta) captures URI, headers, body Module POSTs in real-time to transactional archive’s Submit URI

56 Memento Update CNI Task Force Meeting, Spring 2011 56 Development of Transactional Web Archive Software Development timeline: Ongoing development (LANL) and testing (ODU) Submit/Access finalized; development focus on collection management Expected release as open source, 3 rd Quarter 2011 Access: Transactional archive natively supports Memento Immediate availability of archived content Export of WARC, e.g. for long-term archiving in other environment

57 Memento Update CNI Task Force Meeting, Spring 2011 57 Memento http://mementoweb.org/ Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps Towards Seamless Navigation of the Web of the Past


Download ppt "Memento Update CNI Task Force Meeting, Spring 2011 1 Memento Herbert Van de Sompel Robert Sanderson Michael L. Nelson Giant Leaps."

Similar presentations


Ads by Google