Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.

Slides:



Advertisements
Similar presentations
OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
Advertisements

Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
ETD’s at the University of Saskatchewan or… David Fox & Darryl Friesen University of Saskatchewan October 4, 2003.
Building Digital Libraries on Open Archives Donatella Castelli IEI-CNR Italy.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
OCLC Online Computer Library Center Harvesting and Resolution Methods for Building OAI-based Services Jeffrey A. Young
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
Thomas G. Habing – University of Illinois at Urbana-Champaign Recap: SIGIR 2001 OAI Workshop 19 September OAI Provider Workshop, University of.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
Open Archives Initiative OAI openarchives.org “Opening Remarks & Historical Overview” - ACM SIGIR’2001 Ed Fox (w. Lagoze.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
ALA 2002 LITA Open Source Software Open Archives Initiative Hussein Suleman AmericanSouth.org 14 June 2002.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Open Virginia Tech DLRL Hussein Suleman
1 NDLTD Welcome and Introduction ETD 2011: 14 th Int. Symp. on ETDs Cape Town, South Africa Edward A. Fox Executive Director, NDLTD,
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
The Open Archives Initiative Protocol for Metadata Harvesting: Overview Jewel Ward Visiting Scholar, Keio University Lib-Sys Seminar, Keio University,
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Tsinghua University Library Yang Zhao & Airong Jiang Tsinghua University Library, Beijing China 4 June, 2004 Electronic Thesis and Dissertation System.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2006.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research.
Hussein Suleman University of Cape Town Department of Computer Science Digital Libraries Laboratory February 2008 Data Curation Repositories:
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research.
Building Interoperable and Accessible ETD Collections: A Practical Guide to Creating Open Archives Hussein Suleman, Digital.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
ETD Search Services Ming Luo Edward A. Fox Virginia Tech.
Metadata Harvesting Interoperable digital collections.
ETDs and NDLTD Hussein Suleman University of Cape Town May 2004.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
NDLTD Union Collection User Services Edward A. Fox Virginia Tech DLRL March 2001.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
NDLTD Toward Universal Accessibility of ETDs: Building the NDLTD Union Archive Hussein Suleman, Edward A. Fox,
OAI and ODL Building Digital Libraries from Components Ryan Richardson Virginia Tech DLRL 18 September 2003.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
NDLTD Standards, Metadata and the OAI-PMH Hussein Suleman University of Cape Town October 2003.
OAI Protocol for Metadata Harvesting hussein suleman uct cs honours 2009.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Getting a Leg Up on OAI for the NSDL
Building Interoperable Digital Libraries: A Practical Guide to creating Open Archives Hussein Suleman, Digital Library Research Laboratory.
Georges Arnaout Chaitanya Krishna
OAI and Metadata Harvesting
Enforcing Interoperability with the Open Archives Initiative Repository Explorer Hussein Suleman, Digital Library Research Laboratory Virginia.
Open Archive Initiative
IVOA Interoperability Meeting - Boston
Presentation transcript:

Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech

SIGIR 2001 Slide 2 1. Introduction What is the OAI-MHP? General System Strategy Case study: NDLTD

SIGIR 2001 Slide What is the OAI-MHP ? What is the Metadata Harvesting Protocol? Protocol to transfer metadata from a source archive to a destination archive Any metadata In a continuous stream As simply as possible

SIGIR 2001 Slide General System Strategy Services Metadata Harvesting Document Model

SIGIR 2001 Slide Case Study: NDLTD Networked Digital Library of Theses and Dissertations Multiple independent university-based collections of electronic documents Virginia Tech Rhodes U. U.Waterloo International ETD Library OAI Metadata Harvesting Protocol

SIGIR 2001 Slide 6 2. Definitions / Concepts Basic Principles What is an Open Archive? Harvesting vs. Federation Data and Service Providers Underlying Technology HTTP and XML Protocol Policies What is a record? Multiplicity of Metadata Sets Datestamp, Harvesting and Flow Control

SIGIR 2001 Slide What is an Open Archive ? Any WWW-based system that can be accessed through the well-defined interface of the Open Archives Protocol for Metadata Harvesting … aka OAI-Compliant Repository No implications for: Physical storage of data Cost of data Metadata and data formats Access control to server

SIGIR 2001 Slide Harvesting vs Federation Competing approaches to interoperability Federation is when services are run remotely on remote data (e.g. Federated searching) Harvesting is when data/metadata is transferred from the remote source to the destination where the services are located (e.g. Union catalogues) Federation requires more effort at each remote source but is easier for the local system and vice versa for harvesting OAI currently focuses on harvesting

SIGIR 2001 Slide Data and Service Providers Data Providers refer to entities who possess data/metadata and are willing to share this with others (internally or externally) via well-defined OAI protocols (e.g. database servers) Service Providers are entities who harvest data from Data Providers in order to provide higher- level services to users (e.g. search engines) OAI uses these denotations for its client/server model (data=server, service=client)

SIGIR 2001 Slide HTTP and XML Metadata Harvesting Protocol is an almost stateless request/response protocol Requests and responses are sent via the HTTP protocol Requests are encoded as GET/POST operations Responses are well-formed XML documents

SIGIR 2001 Slide What is a record ? A record refers to an independent XML structure that may be associated with digital or physical objects Records are usually associated with metadata, not data OAI advocates harvesting of records, which contain metadata and additional fields to support the harvesting operation

SIGIR 2001 Slide Sample OAI Record oai:sigir:ws OAI Workshop at SIGIR Hussein Suleman English oai:sigir:ws3md

SIGIR 2001 Slide Multiplicity of Metadata Multiple formats of metadata allowed Dublin Core is mandatory Any other format allowed as long as it has an XML encoding E.g. MARC (Libraries), IMS (Education), ETDMS (Theses/Dissertations), RFC1807 (Bibliographies)

SIGIR 2001 Slide Sets Protocol mechanism to allow for harvesting of sub-collections No well-defined semantics – depends completely on local data providers May be defined by arrangement between data providers and service providers E.g. Subject areas, years, author names, search queries

SIGIR 2001 Slide Datestamps & Harvesting Each record needs a datestamp that indicates its date of creation or modification Dates are used to allow for harvesting by date range, thus allowing incremental and continuous transfer of metadata from a data provider to a service provider

SIGIR 2001 Slide Flow Control HTTP “retry-after” mechanism can be leveraged to support server-side delaying of a client’s request Resumption Tokens can be used to return partial results – the client is issued with a token which may be presented to the server to receive more results

SIGIR 2001 Slide Metadata Harvesting Protocol Service Requests Identify ListMetadataFormats ListSets GetRecord ListIdentifiers ListRecords Metadata Multiplicity Date Ranges Resumption Tokens

SIGIR 2001 Slide Identify Purpose Return general information about the archive and its policies Parameters None Sample URL

SIGIR 2001 Slide Identify - Response

SIGIR 2001 Slide ListMetadataFormats Purpose List metadata formats supported by the archive as well as their schema locations and namespaces Parameters identifier – for a specific record (O) Sample URL

SIGIR 2001 Slide ListMetadataFormats - Response

SIGIR 2001 Slide ListSets Purpose Provide a hierarchical listing of sets in which records may be organized Parameters None Sample URL

SIGIR 2001 Slide ListSets – Response

SIGIR 2001 Slide GetRecord Purpose Returns the metadata for a single identifier in the form of an OAI record Parameters identifier – unique id for record (R) metadataPrefix – metadata format (R) Sample URL verb=GetRecord&identifier=oai:test:123&metadataPrefix=oai_dc

SIGIR 2001 Slide GetRecord - Response

SIGIR 2001 Slide ListIdentifiers Purpose List all unique identifiers corresponding to records in the repository Parameters from – start date (O) until – end date (O) set – set to harvest from (O) resumptionToken – flow control mechanism (X) Sample URL

SIGIR 2001 Slide ListIdentifiers - Response

SIGIR 2001 Slide ListRecords Purpose Retrieves metadata for multiple records Parameters from – start date (O) until – end date (O) set – set to harvest from (O) resumptionToken – flow control mechanism (X) metadataPrefix – metadata format (R) Sample URL verb=ListRecord&metadataprefix=oai_dc&from=

SIGIR 2001 Slide ListRecords - Response

SIGIR 2001 Slide Metadata Multiplicity

SIGIR 2001 Slide Date Ranges

SIGIR 2001 Slide Resumption Token

SIGIR 2001 Slide 33 That’s All Folks !

The OAI Metadata Harvesting Protocol - Communities and Services Hussein Suleman, Digital Library Research Laboratory Virginia Tech

SIGIR 2001 Slide Service Providers Harvesting 101/102/103 Scheduling Tools Repository Explorer Case Study: ARC Case Study: NDLTD VTLS Virtua

SIGIR 2001 Slide Harvesting 101 ListRecords (from= ) Response ListRecords (resumptionToken=1) Response ListRecords (from= ) Response resumption Token=1 SERVICE PROVIDER DATA PROVIDER Set date=09-13 DAY ONE DAY TWO Set date=

SIGIR 2001 Slide Harvesting 102 ListMetadataFormats Response ListIdentifiers Response GetRecord (id=1, prefix=oai_dc) Response Identifier:1 Identifier:2 Identifier:3 SERVICE PROVIDER DATA PROVIDER oai_dc oai_rfc1807 GetRecord (id=2, prefix=oai_dc) Response... record1 record2

SIGIR 2001 Slide Harvesting 103 ListMetadataFormats (id=1) Response ListIdentifiers Response GetRecord (id=1, prefix=oai_dc) Response Identifier:1 Identifier:2 Identifier:3 SERVICE PROVIDER DATA PROVIDER oai_dc oai_rfc record1

SIGIR 2001 Slide Scheduling Problems: Granularity is coarse Timezones are local for each site Solutions: Overlap one day to compensate for granularity Overlap one day or use remote times to compensate for timezones

SIGIR 2001 Slide Tools Check OAI website for sample code XML parsers – depending on platform – check W3C XML Schema validators Very few available – the reference version works but may not be easy to install Ignore validation if you can trust the source Sample data providers – check the OAI website for a list of conformant public archives

SIGIR 2001 Slide Repository Explorer

SIGIR 2001 Slide Case Study: ARC

SIGIR 2001 Slide Case Study: NDLTD Virginia TechU. OldenbergHumboldt U. NDLTD ETD Union Catalog VTLS VirtuaMARIAN Search/Browse Engines RecommenderCross-Ref. Other Services … …

SIGIR 2001 Slide VTLS Virtua

SIGIR 2001 Slide OAI Communities Shared Metadata Formats Shared semantics Layering over OAI Closed OAI networks OAI within the DL

SIGIR 2001 Slide Shared Metadata Formats Use metadata formats accepted within a community to convey more specific information Examples E-Print format (under development) ETD-MS for theses and dissertations VRA Core for multimedia IMS Metadata for educational material

SIGIR 2001 Slide Shared Semantics Develop a shared understanding for the meanings of fields Examples Developing controlled vocabularies for fields Using specific fields for external links (OAI recommends using identifier in DC for this) Choosing from among existing standards (like language names)

SIGIR 2001 Slide Layering over OAI Convert OAI records into more standard formats like MARC communications format Collapse multiple requests into one to make harvesting easier Name authority system (developed at OCLC) piggybacks name resolution over the OAI protocol

SIGIR 2001 Slide Closed OAI networks Data providers need not go public ! Within an organization, OAI can be used for data transfer among heterogeneous systems More control over use, making global optimizations possible (like harvesting schedules and choice of metadata formats)

SIGIR 2001 Slide OAI within the DL Use the OAI protocol as the basis for components to communicate Examples Search Engines could use dynamic sets to correspond to search results Browsing can be directed by sets Reviews and Annotations can each be independent OAI data providers

SIGIR 2001 Slide Now What ? Reality Check Links More Links

SIGIR 2001 Slide Reality Check DO I REALLY WANT TO DO THIS? Can I satisfy the requirements to be a data provider? Do I want to be a service provider ? Do I want to adopt and support this within my community ?

SIGIR 2001 Slide Links Open Archives Initiative OAI Metadata Harvesting Protocol Virginia Tech DLRL OAI Projects Repository Explorer NDLTD

SIGIR 2001 Slide More Links ARC Cross-Archive Search Service XML Schema Validator Dublin Core Metadata Initiative E-Prints DL-in-a-box XML Tools at W3C

SIGIR 2001 Slide 55 That’s All Folks !