Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science

Similar presentations


Presentation on theme: "An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science"— Presentation transcript:

1 An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science http://www.ics.uci.edu/~jfeise/ jfeise@ics.uci.edu

2 Motivation Web resources change often Previous versions are no longer accessible Only the webmaster may know the resource history The Web doesn’t have a memory Who needs a history of Web resources? Organizations Development teams Historians Journalists

3 Current Approaches Search Engines Only one version Stored versions often outdated The Internet Archive Currently 14-15 TB of Web resources  Starting October 1996 No metadata storage  Related resources may be scattered across files Online access probably infeasible Low collection frequency

4 Configuration Management System Proxy/Cache Web Our Architecture

5 Resource Storage and Access Modified Squid Proxy/Web Cache Piggybacking on cache functionality Access of historical versions Detection of date/revision selection Navigational features: next/previous day/month/revision Connection to Configuration Management System Retrieval of requested revision from CMS Possibility of distribution of the CMS storage

6 Transparency Transparent access through Proxy Browser usage doesn’t change Comparison of last stored version with current version User’s selection is stored in CMS with current date/time

7 Example

8 Limitations Resource location changes Resource deletion Collection frequency Difficulty of capturing highly dynamic resources Only pages visited get collected Link consistency problems

9 Legal Issues Intellectual Property Rights and Privacy Configuration for opt-out/opt-in strategies  Granularity: group-wide/company-wide settings  Deleting all old revisions? Copyright issues Access rights Who can view what? Rights may change over time Censorship Bypassing with P2P technology e.g., Freenet

10 Conclusions New approach to access histories of Web resources Designed for online access with standard browser Prototype implementation Used for performance tests Scalability remains to be tested Considering backend storage replacement, e.g., with a DeltaV server Legal issues exist

11 Thank You Thank you for your attention


Download ppt "An Approach to Persistence of Web Resources Joachim Feise University of California, Irvine Information and Computer Science"

Similar presentations


Ads by Google