Presentation is loading. Please wait.

Presentation is loading. Please wait.

Digital Library Interoperability Architecture CS 502 – 20030305 Carl Lagoze – Cornell University.

Similar presentations


Presentation on theme: "Digital Library Interoperability Architecture CS 502 – 20030305 Carl Lagoze – Cornell University."— Presentation transcript:

1 Digital Library Interoperability Architecture CS 502 – 20030305 Carl Lagoze – Cornell University

2 Interoperability is multidimensional Syntax –XML Semantics –RDF/RDFS/OWL Vocabularies/Ontologies –Dublin Core/ABC/CIDOC-CRM Search and discovery –Z39.50 –SDLIP –ZING Document models –METS –FEDORA

3 Contrast to Distributed Systems Distributed systems –Collections of components at different sites that are carefully designed to work with each other Heterogeneous or federated systems –Cooperating systems in which individual components are designed or operated automously

4 Measuring success of interoperability solutions Degree of component automony Cost of infrastructure Ease of contributing components Ease of using components Breadth of task complexity supported by the solution Scalability in the number of components

5 Families of interoperability solutions

6 Interoperability Trade-offs Cost Functionality HTTP Google Z39.50 SGML Dublin Core Metadata Harvesting Dienst

7 Cornell CS 502 20020307 7 Dienst is a protocol and reference implementation of a distributed digital library service where a network of services provide World Wide Web browser access, uniform search over distributed indexes, and access to structured documents.

8 Why a service based protocol? Expose the operational semantics of the services through an API, to permit flexible integration of the services, and use of the services by other clients/consumers/services.

9 Defining the services Repository – deposit, storage, and access to structured documents. Index – process queries on documents and returned handles Query Mediator – route queries to appropriate indexes Collection – define services and content in logical collections User Interface – human-oriented front-end for services. Name Server – Resolves URN’s (handles) to document location(s)

10 Dienst Services WWW browser User Interface Repository Index Repository QM user query generic search request specific search request NS user document request URI document request Collection Collection metadata

11 Defining the protocol Structured messages –Service –Version –Verb –Arguments Template /Dienst/ / / [?/] Example /Dienst/Repository/4.0/Formats/ncstrl.cornell/TR94-1418

12 Why a Document Model? “Documents” in current web are both: –Unstructured (GET) –Chaotic (CGI) Different views and pieces of contents are needed for: –Bandwidth reduction –Rights management –Usability

13 Dienst Document Model Metadata – support for multiple descriptive formats Views – alternative expression or structural representation of the content encapsulated in the digital object Divs – hierarchically nested structure contained in a view

14 Expressing the document model in the protocol Structure – expose the views and structure for the digital object Disseminate – select the structural component (and packaging of it) to disseminate List-Meta-Formats – list available descriptive formats

15 Protocol Demonstration http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/List-Contents?file-after=2003-01-01http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/List-Contents?file-after=2003-01-01 http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR90-1160/%23oams/xmlhttp://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR90-1160/%23oams/xml http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/2.0/Structure/cul.cs/TR90-1160http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/2.0/Structure/cul.cs/TR90-1160 http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/Formats/cul.cs/TR90-1160?part=bodyhttp://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/Formats/cul.cs/TR90-1160?part=body http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR90- 1160/body/inline?pageimage=3http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR90- 1160/body/inline?pageimage=3

16 Cornell CS 502 20020307 16 Collection Service Periodically polled by each user interface server for –elements of the collection –index servers for the collection User Interface Servers Index Servers

17 Deploying Collection Globally Internet connectivity varies considerably Good connectivity between nodes often does not correspond to geographic proximity Connectivity Region - a group of nodes on the network that among them have good connectivity, relative to nodes outside of the region.

18 Connectivity Regions When possible route queries within region In case of failure, use an alternate either within the region or in a “nearby” region

19 Origins of the OAI Increasing interest in alternative scholarly publishing solutions – e.g., LANL arXiv Increasing impact through federation UPS Mtg., Sante Fe, October 1999 –Representatives of various ePrint, library, publishing, communities –Goal: definition of an interoperability framework among ePrint providers –Reality: Rich interoperability protocols like Dienst are too complicated for widespread deployment –Result: Santa Fe Convention, interoperability through metadata harvesting

20 Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

21 Yes, its about resource discovery over distributed collections metadata Author Title Abstract Identifer

22 Facilitating/Monitoring Longevity of Distributed Content Preservation Service

23 DigitalObject Realaudio video Powerpoint presentation SMIL synchronization metadata structural metadata Portal APortal B View A: View Slides View Video View synchronized presentation using applet View B: Get Transcript of Audio Search for keyword Get Slides translated to French Tool Repository Personalization of Content

24 Cross-Repository Reference Linking citation metadata citation metadata citation metadata citation metadata citation metadata Linkage Service

25 OAI Technical Infrastructure Key technical features Deploy now technology – 80/20 rule Two-party model – providers (data providers) and consumers (service providers) Simple HTTP encoding XML schema for some degree of protocol conformance Extensibility –Multiple item-level metadata –Collection level metadata

26 Content and Metadata resource Item (metadata) repository 010010 record

27 http://www.openarchives.org/OAI/openarchivesprotocol.html

28 record oai:eg:001 1999-01-01 My Example No restrictions protocol support format-specific metadata community-specific record data

29 selective harvesting - datestamps repositoryrepository harvest within date range record

30 selective harvesting - sets repositoryrepository harvest within set S1 record S2

31 set specifics repositories define hierarchical organization each item in a repository may be organized in one set, several sets, or no sets at all meaning of sets or of set hierarchy is not defined in protocol individual communities may formulate common set configurations

32 HTTP encoding - requests BASE-URL ----------->an.oa.org/OAI-script keyword arguments -->verb=ListIdentifers&set=S1 GET http://an.oa.org/OAI-script?verb=ListIdentifers&set=S1 POST POST http://an.oa.org/OAI-script HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1

33 HTTP encoding - responses 2000-19-01T19:30:30-04:00 http://an.oa.org/OAI-script?verb=GetRecord &identifier=oai%3AarXiv%3A0001 &metadataPrefix=oai_dc record contents response header xml namespaces response data

34 metadata prefix and schema support for harvesting multiple metadata formats –metadata schema: each format must have a validating XML schema at a publicly accessible URL (communities may define shared formats and schema. –metadata prefix: each repository maps a prefix to the schema it supports, which is used in protocol requests. support for unqualified Dublin Core mandatory –DC OAI record syntax that builds on base DCMI schema –reserved prefix oai_dc.

35 flow control protocol request harvesterharvester repositoryrepository

36 flow control specifics applies to all protocol requests that return lists: ListRecords, ListIdentifiers, ListSets resumptionToken is opaque semantics of partitioning of responses within resumption requests is undefined

37 Extensibility Feature Summary Multiple metadata formats Collection level metadata –Identify “about” container Record data –Terms and conditions –Provenance Set structure –Pre-configured “queries”

38 Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository harvesterharvester service providerdata provider OAI Protocol

39 Challenges and Questions Utility of lowest common denominator metadata such as DC Quality of metadata from non-professional contributors Machines processing to reduce and compliment human effort Functionality of service structure


Download ppt "Digital Library Interoperability Architecture CS 502 – 20030305 Carl Lagoze – Cornell University."

Similar presentations


Ads by Google