Presentation is loading. Please wait.

Presentation is loading. Please wait.

ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory.

Similar presentations


Presentation on theme: "ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory."— Presentation transcript:

1 ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory @mart1nkle1n http://www.openarchives.org/rs #resourcesync Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync

2 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 ResourceSync Collaboration between NISO and the Open Archives Initiative Funded by the Sloan Foundation and JISC Goal: Devise a specification for web-based resource synchronization

3 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework – Technology Demonstration Status

4 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Background - OAI-PMH Recurrent metadata exchange from a Data Provider to Service Providers XML metadata only Repository centric Devised 1999-2002, prior to REST, prior to dominance of web search engines

5 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Revisit the Problem Domain - ResourceSync Synchronization of resources from a Source to Destinations Web resources, anything with an HTTP URI & representation Resource centric Devised 2012-2013, leverages key ingredients of web interoperability, existing specifications, existing Search Engine Optimization practice

6 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Problem Statement Consideration: Source (server) A has resources that change over time: they get created, modified, deleted Destination (servers) X, Y, and Z leverage (some) resources of Source A Problem: Destinations want to keep in step with the resource changes at Source A

7 A Source’s Resources

8 A Source’s Resources Evolve over Time

9

10

11

12

13

14 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Problem Statement Consideration: Source (server) A has resources that change over time: they get created, modified, deleted Destination (servers) X, Y, and Z leverage (some) resources of Source A Problem: Destinations want to keep in step with the resource changes at Source A Goal: Design an approach for resource synchronization aligned with the Web Architecture that has a fair chance of adoption by different communities

15 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework – Technology Demonstration Status

16 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope – Collection Size Size of a Source’s resource collection: A few resources - small web sites, repositories Millions of resources – large repositories, datasets, linked data collections

17 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope – Change Frequency Change frequency of a Source’s resources: Low – daily, weekly, monthly High – seconds, minutes

18 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope – Synchronization Latency Destination’s requirements regarding synchronization latency: High latency acceptable Low latency essential

19 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope – Collection Coverage Destination’s requirements regarding the coverage of a Source’s resources: Partial coverage of the Source’s resources acceptable Full coverage of the Source’s resources verifiable

20 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Scope – Bitstream Accuracy Destination’s requirements regarding bitstream accuracy: Unverifiable bitstream accuracy acceptable Verifiable bitstream coverage essential

21 One to One Synchronization

22 One to Many – Master Copy

23 Many to One - Aggregator

24 Selective Synchronization

25 Metadata Harvesting

26 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework – Technology Demonstration Status

27 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 A Source’s Resources Evolve over Time

28 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Solution Perspective - Destination Destination needs regarding synchronization: Baseline synchronization: Initial catch-up operation to align with the Source’s resources Incremental synchronization: Remain synchronized as the Source’s resources evolve Audit: Destination determines whether it effectively is in sync with the Source -Bitstream accuracy -Coverage of resources

29 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Solution Perspective - Source Source communicates about the state of its resources: Publish inventory: snapshot of the state of resources at a moment in time Publish changes: enumeration of resource changes that occurred during a temporal interval Notify about changes: send notifications as changes occur Communication payload: Minimal, e.g. HTTP URI of resource Additional, e.g. content-based hash of resource

30 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Resource List In order to meet a Destination’s need for baseline synchronization, the Source may publish a Resource List A Resource List is an inventory, a snapshot of existing resources Per resource, it minimally provides the resource’s URI Process: -Destination obtains the Resource List -Destination obtains listed resources by their URI -Optimization: Resource Dump, a list pointing to ZIP files that contain resource representations

31 Publish Resource List: Inventory at Tx Resource List @Tx = { A ; B ; C }

32 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change List In order to meet a Destination’s need for incremental synchronization, the Source may publish a Change List A Change List enumerates resource change events that occurred in a temporal interval For each event, it minimally lists datetime, URI of the resource, the nature of the change Process: -Destination obtains the Change List -Destination obtains created/updated resources, removes deleted resources -Optimization: Change Dump

33 Publish Change List: Resource Changes During Interval Ty-Tz Change List [Ty,Tz] = { A updated @Tc ; B updated @Tc ; C created @Td ; D deleted @Te ; C updated @Tf }

34 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change Notification In order to meet a Destination’s need for incremental synchronization and low latency, the Source may send Change Notifications A Change Notification conveys resource change events as they occur For each event, it minimally lists datetime, URI of the resource, the nature of the change -Process: -Destination receives Change Notification -Destination obtains created/updated resources, removes deleted resources

35 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Ta Change Notification @Ta = { A updated @Ta }

36 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Tb Change Notification @Tb = { D updated @Tb }

37 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Tc Change Notification @Tc = { A updated @Tc ; B updated @Tc }

38 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Td Change Notification @Td = { C created @Td }

39 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Te Change Notification @Te = { D deleted @Te }

40 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Send Change Notification – Resource Changes at Tf Change Notification @Tf = { C updated @Tf }

41 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Communication Payload – Metadata & Links A Source may provide additional metadata and links pertaining to resources conveyed in Resource Lists, Change Lists, Change Notifications Metadata about a resource: content encoding, content length, mime type, content-based hash Linking to related resources: mirror copies, alternate representations, resource versions, diff between current and previous version, metadata-to-content link, content-to- metadata link, collection membership, etc.

42 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Communication Payload – Metadata – Hash In order to meet a Destination’s need for audit, the Source may provide a content-based hash pertaining to a resource Source computes the content-based hash for a resource Source provides the hash as metadata pertaining to the resource in its communication payload Destination processes communication payload, obtains the resource Destination computes the content-based hash for the obtained resource, compares with the Source’s

43 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Communication Payload – Link – Interlink Metadata & Content In order to allow a Destination to establish the relationship between a Source’s metadata and a Source’s content, the Source may provide appropriate links Metadata resources and content resources are just resources identified by HTTP URIs Both can independently be subject to synchronization and can be interlinked using appropriately typed links

44 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Communication Payload – Link – Link to Diff In order to minimize content transfer, a Source may link to a diff between the previous and the new version of a resource Destination can obtain the diff and patch its (previous) version of the resource Connection between the resource and the diff is established by means of appropriately typed link Nature of the diff is established by means of MIME type  Few diff MIME types exist. Communities can establish their own.

45 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Further Framework Characteristics Modular: A Source does not have to implement all capabilities Source decides which capabilities to support based on local and community requirements Sets of Resources: Division of a Source’s resource collection in logical groupings. Supported capabilities can differ per set Discovery: Mechanisms for Destinations to determine whether and how a Source supports ResourceSync Based on conventions for web discovery and documents that detail the level of support

46 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Framework - Overview Framework – Technology Demonstration Status

47 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Sitemap Protocol ResourceSync builds on the Sitemap protocol used by major search engines Similarity between resource synchronization and resource discovery/indexing Extends the Sitemap protocol to meet synchronization needs Cf. Metadata and Links Sitemap document format is used throughout the framework to express Resource Lists, Change Lists, etc. Type of ResourceSync document can be determined through explicit declaration

48 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Common Sitemap http://example.com/res1 2013-01-02T13:00:00Z http://example.com/res2 2013-01-02T14:00:00Z …

49 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Resource List <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability="resourcelist" at="2013-01-03T09:00:00Z” /> http://example.com/res1 2013-01-02T13:00:00Z <rs:md hash="md5:1584abdf8ebdc9802ac0c6a7402c03b6" length="8876" type=”application/pdf” /> …

50 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change List <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> <rs:md capability=”changelist" from="2013-01-02T09:00:00Z” until="2013-01-03T09:00:00Z” /> http://example.com/res2 2013-01-02T13:00:00Z <rs:ln href=“http://example.com/res2/meta” rel=“describedby” /> …

51 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Change Notification <urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:rs="http://www.openarchives.org/rs/terms/"> http://example.com/res3 2013-01-02T13:00:00Z <rs:md change=”updated” type=”application/json” /> <rs:ln href=“http://example.com/res3/diff” rel=“http://www.openarchives.org/rs/terms/patch” type=“application/json-patch”/> …

52 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 PubSubHubbub Protocol ResourceSync builds on the PuSH protocol used for syndication of Atom/RSS feeds Introduces a novel use for pushing change notifications from a Source to subscribing Destinations Destinations subscribe to a Source’s change notifications via a Hub The Source pushes change notifications out to the Hub The Hub relays change notifications from the Source to the subscribing Destinations

53 Destination Discovers Source’s Notification Channel & Hub

54 Destination Subscribes to Source’s Notification Channel

55 Source Sends Change Notification to Hub

56 Hub Relays Change Notification to Destination(s)

57 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Example Use Cases Framework Characteristics Demonstration Status

58 Source Sends Change Notifications

59 Destination Acts Upon Change Notifications

60 Observe the State of Source and Destination

61 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 This ResourceSync Presentation Problem Domain Scope Example Use Cases Framework Characteristics Demonstration Status

62 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Status of Specifications ResourceSync Core Specification currently in NISO Voting Pool Ends April 16 2014 Please vote if you are a NISO member If vote is positive should be ANISO/NISO Z39.99 by July 2014 Spec on OAI web site fully aligned with NISO Standard ResourceSync Notification Specification currently in beta Ongoing tests with PubSubHubbub Hub, Source, Destination Python software to be released Public Hub for testing will be made available ResourceSync Archive Specification currently in beta

63 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Pointers Specification http://www.openarchives.org/rs/ http://www.openarchives.org/rs/resourcesync http://www.openarchives.org/rs/notification http://www.openarchives.org/rs/archives List for public comment https://groups.google.com/d/forum/resourcesync Building blocks http://sitemaps.org https://code.google.com/p/pubsubhubbub/

64 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Papers Klein, M., and Van de Sompel, H. (2013) Extending Sitemaps for Resourcesync. http://arxiv.org/abs/1305.4890 ACM/IEEE JCDL 2013http://arxiv.org/abs/1305.4890 Haslhofer, B., Warner, S, Lagoze, C., Klein, M., Sanderson, R., Nelson, M.L. and Van de Sompel, H. (2013) ResourceSync: Leveraging Sitemaps for Resource Synchronization. http://arxiv.org/abs/1305.1476 WWW 2013 Developer Trackhttp://arxiv.org/abs/1305.1476 Klein, M., Sanderson, R., Van de Sompel, H., Warner, S, Haslhofer, B., Lagoze, C., and Nelson, M.L. (2013) A Technical Framework for Resource Synchronization. http://dx.doi.org/10.1045/january2013-klein D-Lib Magazine.http://dx.doi.org/10.1045/january2013-klein Van de Sompel, H., Sanderson, R., Klein, M., Nelson, M.L., Haslhofer, B., Warner, S, and Lagoze, C. (2012) A Perspective on Resource Synchronization. http://dx.doi.org/10.1045/september2012-vandesompel D- Lib Magazine.http://dx.doi.org/10.1045/september2012-vandesompel

65 Herbert Van de Sompel, Martin Klein - ResourceSync CNI Spring 2014, St. Louis, MO, March 31 2014 Acknowledgments Editors: Martin Klein, Robert Sanderson, Herbert Van de Sompel (Los Alamos National Laboratory), Simeon Warner (Cornell University), Graham Klyne ( University of Oxford), Bernhard Haslhofer (University of Vienna), Michael L. Nelson (Old Dominion University), Carl Lagoze ( University of Michigan) Contributors: Peter Murray (Lyrasis), Todd Carpenter, Nettie Lagace (NISO), Richard Jones (Cottage Labs), Stuart Lewis (University of Edinburgh), Paul Walk (University of Edinburgh), Jeff Young (OCLC), Shlomo Sanders (Ex Libris), Kevin Ford (Library of Congress) Demonstration: Martin Klein, Harihar Shankar, Herbert Van de Sompel (Los Alamos National Laboratory)

66 ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory @mart1nkle1n http://www.openarchives.org/rs #resourcesync Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp ResourceSync


Download ppt "ResourceSync was funded by the Sloan Foundation & JISC A Modular Framework for Web-Based Resource Synchronization Martin Klein Los Alamos National Laboratory."

Similar presentations


Ads by Google