Metadata Harvesting Interoperable digital collections.

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
Forest Markup / Metadata Language FML
SOAP Quang Vinh Pham Simon De Baets Université Libre de Bruxelles1.
OAI in DigiTool DigiTool Version 3.0.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
1 Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Alon Kadury.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Systems Architecture, Fourth Edition1 Internet and Distributed Application Services Chapter 13.
The Semantic Web. Schedule for this evening Review of the survey – Summary. Discussion if wanted Some other ways to move content from place to place –
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Web Services Michael Smith Alex Feldman. What is a Web Service? A Web service is a message-oriented software system designed to support inter-operable.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Thomas G. Habing – University of Illinois at Urbana-Champaign Recap: SIGIR 2001 OAI Workshop 19 September OAI Provider Workshop, University of.
LIS654 lecture repository interoperability Thomas Krichel
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
OAI-PMH The Open Archives Initiative Protocol for Metadata Harvesting Presenter: Knud Möller Friday,
WEB BASED DATA TRANSFORMATION USING XML, JAVA Group members: Darius Balarashti & Matt Smith.
IESR Interfaces: Current Services and Future Plans Ann Apps MIMAS, The University of Manchester, UK.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
XML Web Services Architecture Siddharth Ruchandani CS 6362 – SW Architecture & Design Summer /11/05.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
The Open Archives Initiative Protocol for Metadata Harvesting: Overview Jewel Ward Visiting Scholar, Keio University Lib-Sys Seminar, Keio University,
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
1 Web Services Web and Database Management System.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2006.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives Hussein Suleman, Digital.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
OAI from the needle box Humboldt Universität Berlin, March 20, 2002 Thomas Krichel Palmer School of Library and Information Science Long Island University.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0 Hussein Suleman Virginia Tech DLRL 25 March 2002.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Metadata Harvesting - OAI-PMH
Getting a Leg Up on OAI for the NSDL
Georges Arnaout Chaitanya Krishna
Outline Pursue Interoperability: Digital Libraries
OAI and Metadata Harvesting
Open Archive Initiative
JISC Information Environment Service Registry (IESR)
WEB SERVICES From Chapter 19, Distributed Systems
IVOA Interoperability Meeting - Boston
Presentation transcript:

Metadata Harvesting Interoperable digital collections

Distributed libraries The reality in most digital libraries is that no one location has all the materials that may be of interest. It is often more efficient to allow a number of sites each to retain some of the materials. How can we assure clients that they will see all relevant resources, regardless of which library they search?

Two basic approaches One service provider with access to resources stored in multiple locations –Information about all the resources located at the service provider. –Services (DL scenarios) use the information to provide connections to resources at multiple locations Distributed services –Information kept with the resources –Services, local to each collection, interact with other collection sites

Two protocols Z39.50 –Developed before the web –Protocol for communicating with collection holders in order to provide services. Open Archives Initiative –Recent innovation –Central service provider gathers information from collection holders

Z briefly Information Retrieval Service Definition and Protocol Specifications for Library Applications Initially developed over the OSI network standards Protocol for information exchange –Free the information seeker from the need to know the details of the target database configuration Each site provides services –Each service queries remote sites for needed information Information requests mapped to database queries at the collection site. Some inconsistency in the interpretation of queries.

Distributed Resources Multiple Services Service provider -- search, browse, compare, etc. Data provider Approach 1 - One service provider gathers information about data and uses it to provide services

Distributed data and services Approach 2: Each system is both a data repository and a service provider. Services query other data providers as needed. Search, browse Search, browse, compare

Service provider -- search, browse, compare, etc. Data provider Each server likely to have its own clients. Difference is whether the information exchange is periodic or ad hoc Hybrid systems

Open Archives Initiative (OAI) Web-based –Uses HTTP to communicate between sites Centralized server –Services provided from a site that has already gathered the information it needs for those services from a distributed collection of sites.

Z39.50 Special purpose protocol (machine to machine, not web interface) Gathers information when it is requested, not on a scheduled basis.

OAI Compared to Z39.50 Z39.50OAI Content (Objects)Distributed World ViewBibliographic Object PresentationData provider Searching isDistributedCentralized Search done byData providerService provider Metadata searched isUp to dateStale Semantic MappingWhen searchingMetadata delivery Source: oai.grainger.uiuc.edu/FinalReport/JCDL_2003_OAI_Intro.ppt

Open Archives Initiative Protocol for Metadata Harvesting -- OAI-PMH Repository OAI Harvester OAI HTTP req (OAI verb) HTTP resp (XML) OAI PMH defines an interface between the Harvester and any number of Repositories Metadata Provider Service Provider Implemented as CGI, ASP, PHP, or other Any system may serve as a harvester, repository, or both

OAI components Service Providers and Data Providers Requests and Responses

Records Metadata of a resource. Three parts –Header (required) Identifier (required: 1 only) Datestamp (required: 1 only) setSpec elements (optional: 0, 1, or more) Status attribute for deleted item –Metadata (required) XML encoded metadata with root tag, namespace Repositories must support Dublin Core, other formats optional –“About” statement (optional) Right statements Provenance statements

Identifiers Globally unique identifier Valid URI –Examples oai: : oai:etd.vt.edu:etd –Must resolve to one item No duplicates No reuse of previously used identifiers

Datestamps Date of last modification of a record –Used only for harvesting (meta metadata?) Mandatory for each item in the repository Two levels of granularity possible –YYYY-MM-DD –YYYY-MM-DDThh:mm:ssZ T … Z = Time zone -- must be GMT Allows harvesting incrementally -- get only what is new since last visit –Accessed by arguments from and until

The OAI-PMH verbs Each requests a specific response from a data repository

Identify Function: Description of the archive Example: Parameters: none Errors/exceptions: –badArgument (there should not be any) Response format: Element Example Ordinality ‡ repositoryName My Archive 1 baseURL 1 protocolVersion earliestDatestamp deleteRecords no, transient, persistent 1 granularity YYYY-MM-DD, YYYY-MM-DDThh:mm:ssZ 1 admin + compression deflate, compress * description oai-identifier, eprints, friends, … * ‡ Ordinality: 1 = mandatory, 1 only; + = mandatory, 1 only; * = optional, 0 or more

T01:37:44Z bin/olaca3.pl − OLAC Aggregator no YYYY-MM-DD − <!-- maybe later identity --> Actual response from Continued

− oai OLACA.language-archives.org : oai:ethnologue.com:aaa Continued

− Steven Bird & Gary Simons Coordinators Open Language Archives Community Philadelphia, U.S.A. − This repository contains all records from OLAC-registered archives. It is intended to be used by services which do not want to harvest individual OLAC archives. − Metadata may be used only subject to the access permissions given by the individual archives.

ListMetadataFormats Function: retrieve available metadata formats from archive Example: archive.org/oai-script?verb=ListMetadataFormats& identifier=oai:HUBerlin.de: Parameters: identifier (optional) Errors/exceptions: –badArgument –idDoesNotExist –noMetadataFormats

− T01:58:06Z bin/olaca3.pl − olac − olac_display − oai_dc Response to olaca3.pl?verb=ListMetadataFormatshttp:// olaca3.pl?verb=ListMetadataFormats

ListSets Function: retrieve set structure of a repository Example: archive.org/oai-script?verb=ListSets Parameters: resumptionToken (exclusive) Errors/exceptions: –badArgument –badResumptionToken –noSetHierarchy Sets are optional and are used to divide a repository into separate units that will be of interest to different harvesters.

ListIdentifiers Function: abbieviated form of ListRecords, retrieve only headers Example: archive.org/oai-script?verb=ListIdentifiers&metadataPrefix= oai_dc&from= Parameters: –from (optional) –until (optional) –metadataPrefix (required) –set (optional) –resumptionToken (exclusive) Errors/exceptions: –badArgument –badResumptionToken –cannotDisseminateFormat –noRecordsMatch –noSetHierarchy

ListRecords Function: harvest records from a repository Example: archive.org/oai-script?verb=ListRecords& metadataPrefix=oai_dc&set=biology Parameters: –from (optional) –until (optional) –metadataPrefix (required) –set (optional) –resumptionToken (exclusive) Errors/exceptions: –badArgument –badResumptionToken –cannotDisseminateFormat –noRecordsMatch –noSetHierarchy

GetRecord Function: retrieve an individual metadata record from a repository Example: archive.org/oai-script?verb=GetRecord&identifier=oai:HUBerlin.de: &metadataPrefix=oai_dc Parameters: –Identifier (required) –metadataPrefix (required) Errors/exceptions: –badArgument –cannotDisseminateFormat –idDoesNotExist

Interoperability The goal: communication, without human intervention, between information sources –Books that “talk to each other” Live links for references Knowledge of how to find relevant resources when needed Ability to query other information locations

Protocols Precise rules for interactions between independent processes –Format of the messages Both structure and content –Specified behavior in response to specific messages Many ways to accomplish the same result, but both sides must have the same understanding of the rules of engagement.

Protocol Types RPC model –Point to point –Completely open to definition by developer Verbs (methods) Nouns (objects, resources) –Useful to closed community or group who know about the availability of the resource.

SOAP Initial words of the acronym have been discontinued. Initially developed as part of the Microsoft.NET paradigm –Now in W3C committee Stateless, one-way message exchange paradigm XML encoded Flexibility of RPC, but more constrained in the way communication is formatted.

REST REpresentational State Transfer An after-the-fact definition of the architecture of the World Wide Web The model is –Client/server –Stateless –Cacheable –Layered Resource interface constrained –Restricted verbs –Restricted content types

REST and RPC RPC provides flexibility for any type of interaction between any type of resources REST provides consistency to allow interaction among resources without prior discovery of accepted actions and responses.

SOAP and REST Debate in the Web community about which is the better paradigm for application development REST -- restricted, but simple extension of existing Web processes SOAP -- added flexibility with cost in terms of bandwidth, security, complexity for development

References Giving SOAP a REST SOAP Version 1.2 Part 0: Primer soap12-part /#L1153http:// soap12-part /#L1153 OAI For Beginners - The Open Archives Forum online tutorial: Z39.50 Resource Page: Z39.50 An Overview of Development and the Future (1995)