Digital Library Interoperability Architecture CS 502 – 20030305 Carl Lagoze – Cornell University.

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
Advertisements

A brief overview of the Open Archives Initiative and OpenURL Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL.
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Digital Library Architecture: A Service-Based Approach
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Building Digital Libraries on Open Archives Donatella Castelli IEI-CNR Italy.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Federated Digital Library Architecture and Distributed Resource Discovery Carl Lagoze CS
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
METS-Based Cataloging Toolkit for Digital Library Management System Dong, Li Tsinghua University Library
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
1 Technologies for distributed systems Andrew Jones School of Computer Science Cardiff University.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Metadata and Geographical Information Systems Adrian Moss KINDS project, Manchester Metropolitan University, UK
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
Modularization and Interoperability: Dublin Core and the Warwick Framework Sandra D. Payette Digital Library Research Group Cornell University November.
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Alexandria Digital Earth ProtoType DIGITAL LIBRARIES AND ENVIRONMENTAL INFORMATION Terence R. Smith Alexandria Digital Library Project.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
Open Archives Initiative OAI openarchives.org “Opening Remarks & Historical Overview” - ACM SIGIR’2001 Ed Fox (w. Lagoze.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
5. Applying metadata standards: Application profiles Metadata Standards and Applications Workshop.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Metadata Harvesting Interoperable digital collections.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
A Semi-Automated Digital Preservation System based on Semantic Web Services Jane Hunter Sharmin Choudhury DSTC PTY LTD, Brisbane, Australia Slides by Ananta.
Getting a Leg Up on OAI for the NSDL
Georges Arnaout Chaitanya Krishna
An Architecture for Complex Objects and their Relationships
OAI and Metadata Harvesting
Open Archive Initiative
IVOA Interoperability Meeting - Boston
Presentation transcript:

Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University

Interoperability is multidimensional Syntax –XML Semantics –RDF/RDFS/OWL Vocabularies/Ontologies –Dublin Core/ABC/CIDOC-CRM Search and discovery –Z39.50 –SDLIP –ZING Document models –METS –FEDORA

Contrast to Distributed Systems Distributed systems –Collections of components at different sites that are carefully designed to work with each other Heterogeneous or federated systems –Cooperating systems in which individual components are designed or operated automously

Measuring success of interoperability solutions Degree of component automony Cost of infrastructure Ease of contributing components Ease of using components Breadth of task complexity supported by the solution Scalability in the number of components

Families of interoperability solutions

Interoperability Trade-offs Cost Functionality HTTP Google Z39.50 SGML Dublin Core Metadata Harvesting Dienst

Cornell CS Dienst is a protocol and reference implementation of a distributed digital library service where a network of services provide World Wide Web browser access, uniform search over distributed indexes, and access to structured documents.

Why a service based protocol? Expose the operational semantics of the services through an API, to permit flexible integration of the services, and use of the services by other clients/consumers/services.

Defining the services Repository – deposit, storage, and access to structured documents. Index – process queries on documents and returned handles Query Mediator – route queries to appropriate indexes Collection – define services and content in logical collections User Interface – human-oriented front-end for services. Name Server – Resolves URN’s (handles) to document location(s)

Dienst Services WWW browser User Interface Repository Index Repository QM user query generic search request specific search request NS user document request URI document request Collection Collection metadata

Defining the protocol Structured messages –Service –Version –Verb –Arguments Template /Dienst/ / / [?/] Example /Dienst/Repository/4.0/Formats/ncstrl.cornell/TR

Why a Document Model? “Documents” in current web are both: –Unstructured (GET) –Chaotic (CGI) Different views and pieces of contents are needed for: –Bandwidth reduction –Rights management –Usability

Dienst Document Model Metadata – support for multiple descriptive formats Views – alternative expression or structural representation of the content encapsulated in the digital object Divs – hierarchically nested structure contained in a view

Expressing the document model in the protocol Structure – expose the views and structure for the digital object Disseminate – select the structural component (and packaging of it) to disseminate List-Meta-Formats – list available descriptive formats

Protocol Demonstration ry/4.0/List-Contents?file-after= http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/List-Contents?file-after= ry/1.0/Disseminate/cul.cs/TR /%23oams/xmlhttp://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR /%23oams/xml ry/2.0/Structure/cul.cs/TR http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/2.0/Structure/cul.cs/TR ry/4.0/Formats/cul.cs/TR ?part=bodyhttp://techreports.library.cornell.edu:8081/Dienst/Reposito ry/4.0/Formats/cul.cs/TR ?part=body ry/1.0/Disseminate/cul.cs/TR /body/inline?pageimage=3http://techreports.library.cornell.edu:8081/Dienst/Reposito ry/1.0/Disseminate/cul.cs/TR /body/inline?pageimage=3

Cornell CS Collection Service Periodically polled by each user interface server for –elements of the collection –index servers for the collection User Interface Servers Index Servers

Deploying Collection Globally Internet connectivity varies considerably Good connectivity between nodes often does not correspond to geographic proximity Connectivity Region - a group of nodes on the network that among them have good connectivity, relative to nodes outside of the region.

Connectivity Regions When possible route queries within region In case of failure, use an alternate either within the region or in a “nearby” region

Origins of the OAI Increasing interest in alternative scholarly publishing solutions – e.g., LANL arXiv Increasing impact through federation UPS Mtg., Sante Fe, October 1999 –Representatives of various ePrint, library, publishing, communities –Goal: definition of an interoperability framework among ePrint providers –Reality: Rich interoperability protocols like Dienst are too complicated for widespread deployment –Result: Santa Fe Convention, interoperability through metadata harvesting

Discovery Current Awareness Preservation Service Providers Data Providers Metadata harvesting The World According to OAI

Yes, its about resource discovery over distributed collections metadata Author Title Abstract Identifer

Facilitating/Monitoring Longevity of Distributed Content Preservation Service

DigitalObject Realaudio video Powerpoint presentation SMIL synchronization metadata structural metadata Portal APortal B View A: View Slides View Video View synchronized presentation using applet View B: Get Transcript of Audio Search for keyword Get Slides translated to French Tool Repository Personalization of Content

Cross-Repository Reference Linking citation metadata citation metadata citation metadata citation metadata citation metadata Linkage Service

OAI Technical Infrastructure Key technical features Deploy now technology – 80/20 rule Two-party model – providers (data providers) and consumers (service providers) Simple HTTP encoding XML schema for some degree of protocol conformance Extensibility –Multiple item-level metadata –Collection level metadata

Content and Metadata resource Item (metadata) repository record

record oai:eg: My Example No restrictions protocol support format-specific metadata community-specific record data

selective harvesting - datestamps repositoryrepository harvest within date range record

selective harvesting - sets repositoryrepository harvest within set S1 record S2

set specifics repositories define hierarchical organization each item in a repository may be organized in one set, several sets, or no sets at all meaning of sets or of set hierarchy is not defined in protocol individual communities may formulate common set configurations

HTTP encoding - requests BASE-URL >an.oa.org/OAI-script keyword arguments -->verb=ListIdentifers&set=S1 GET POST POST HTTP/1.0 Content-Length: 78 Content-Type: application/x-www-form-urlencoded verb=ListIdentifers&set=S1

HTTP encoding - responses T19:30:30-04:00 &identifier=oai%3AarXiv%3A0001 &metadataPrefix=oai_dc record contents response header xml namespaces response data

metadata prefix and schema support for harvesting multiple metadata formats –metadata schema: each format must have a validating XML schema at a publicly accessible URL (communities may define shared formats and schema. –metadata prefix: each repository maps a prefix to the schema it supports, which is used in protocol requests. support for unqualified Dublin Core mandatory –DC OAI record syntax that builds on base DCMI schema –reserved prefix oai_dc.

flow control protocol request harvesterharvester repositoryrepository

flow control specifics applies to all protocol requests that return lists: ListRecords, ListIdentifiers, ListSets resumptionToken is opaque semantics of partitioning of responses within resumption requests is undefined

Extensibility Feature Summary Multiple metadata formats Collection level metadata –Identify “about” container Record data –Terms and conditions –Provenance Set structure –Pre-configured “queries”

Supporting protocol requests: Identify ListMetadataFormats ListSets Harvesting protocol requests: ListRecords ListIdentifiers GetRecord repositoryrepository harvesterharvester service providerdata provider OAI Protocol

Challenges and Questions Utility of lowest common denominator metadata such as DC Quality of metadata from non-professional contributors Machines processing to reduce and compliment human effort Functionality of service structure