Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 1 Hiberlink – Towards Time Travel for the Scholarly Web Martin.

Similar presentations


Presentation on theme: "Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 1 Hiberlink – Towards Time Travel for the Scholarly Web Martin."— Presentation transcript:

1 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 1 Hiberlink – Towards Time Travel for the Scholarly Web Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @azaroth42 Herbert Van de Sompel hvdsomp@gmail.com @hvdsomp http://www.hiberlink.org/ http://www.mementoweb.org/ The Hiberlink Project is supported by the Andrew W. Mellon Foundation

2 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 2 LANL Herbert Van de Sompel Rob Sanderson Martin Klein U. Edinburgh Claire Grover Beatrix Alex Richard Tobin Adam Zhou Hiberlink Project and Partners EDINA Peter Burnhill Christine Rees Muriel Mewissen Tim Strickland Neil Mayo Two year project funded by Andrew W. Mellon Foundation

3 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 3 Problem Statement Preservation of formal scholarly output is (relatively) well understood. Preservation of the resources that make up the context for that research is not: Datasets Software Workflows Videos, Slides Project and Demonstration web sites AJAX …

4 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 4 To what extent are web resources that are referenced from works in repositories still available at their original URL … or from archives of web resources? Participants: LANL, UNT, arXiv Paper: http://arxiv.org/abs/1105.3459 Contributions: Much larger scale than any previous study, 162,052 unique URLs Automatically searched multiple archives for all URLs, rather than manually for a small subset Pilot Study

5 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 5 Pilot Study: Method Filter Links Normalize Links Extract Links Extract Metadata Normalize Metadata Results: (URL,Time, Memento- Time, Paper, Subject) (URL, Paper, Time, Subject) * * We filtered broken and intra/inter-repository links.

6 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 6 Memento

7 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 7 Pilot Study: Results 72% in archives and/or still exist High proportion of archived URLs, possibly due to academic level and general disciplines 78% in archives and/or still exist 45% still exist, but not archived! Possibly due to high value, but very discipline specific references UNT arXiv

8 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 8 To what extent are web resources that are referenced from works in repositories still available at their original URL … or from archives of web resources? Redo the same experiment with… Even larger dataset with millions of papers and URLs Text mining processes for URL extraction Track location of URL (citations, footnote, text, etc) Evaluation of extraction via gold standard dataset Determine type of resource referenced Track type of publication (journal, thesis, report, etc) Hiberlink: Quantify Full Extent of the Problem

9 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 9 We propose two active archiving solutions of resources referenced from scholarly papers to ensure that the scholarly record remains unbroken 1. Active Crawling: Run extraction routines at repositories, publishers, or third parties via text mining agreements or open access publications Feed the URL seed list to existing web crawlers, such as the Internet Archive IA (and others) already Memento compliant Hiberlink: Propose Solutions (1)

10 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 10 2. Transactional Archiving: Willing server forks responses for resources and sends to both browser and to archive for preservation Hiberlink: Propose Solutions (2)

11 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 11 2011 pilot study showed: Significant problem! Random archiving by web crawlers is not enough Hiberlink project will: Fully quantify the extent to which web resources that form the context of scholarly output are available and archived Propose active solutions to prevent the loss of further resources Use Memento for both research and access Summary

12 Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 12 Hiberlink – Towards Time Travel for the Scholarly Web Martin Klein martinklein0815@gmail.com @mart1nkle1n Robert Sanderson azaroth42@gmail.com @azaroth42 Herbert Van de Sompel hvdsomp@gmail.com @hvdsomp http://www.hiberlink.org/ http://www.mementoweb.org/ The Hiberlink Project is supported by the Andrew W. Mellon Foundation


Download ppt "Hiberlink – Towards Time Travel for the Scholarly Web July 25 th 2013, Indianapolis, IN, USA 1 Hiberlink – Towards Time Travel for the Scholarly Web Martin."

Similar presentations


Ads by Google