Presentation is loading. Please wait.

Presentation is loading. Please wait.

A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,

Similar presentations


Presentation on theme: "A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,"— Presentation transcript:

1 a centre of expertise in digital information management www.ukoln.ac.uk The OAI Protocol for Metadata Harvesting Andy Powell a.powell@ukoln.ac.uk UKOLN, University of Bath IVOA Registry Meeting, London March 2003

2 2 Contents a brief history of OAI 10 technical things you should know about the OAI-PMH

3 3 OAI roots the roots of OAI lie in the development of eprint archives… –arXiv, CogPrints, NACA (NASA), RePEc, NDLTD, NCSTRL each offered Web interface for deposit of articles and for end-user searches difficult for end-users to work across archives without having to learn multiple different interfaces recognised need for single search interface to all archives –Universal Pre-print Service (UPS)

4 4 Searching vs. harvesting two possible approaches to building a single search interface to multiple eprint archives… –cross-searching multiple archives based on protocol like Z39.50 –harvesting metadata into one or more central services – bulk move data to the user-interface US digital library experience in this area indicated that cross-searching not preferred approach –distributed searching of N nodes viable, but only for small values of N

5 5 Searching vs. harvesting search service …or…

6 6 Harvesting requirements in order that harvesting approach can work there need to be agreements about… –transport protocols – HTTP vs. FTP vs. … –metadata formats – DC vs. MARC vs. … –quality assurance – mandatory elements, mechanisms for naming of people, subjects, etc., handling duplicated records, best-practice –intellectual property and usage rights – who can do what with the records work in this area resulted in the Santa Fe Convention

7 7 Development of OAI-PMH 2 year metamorphosis thru various names –Santa Fe Convention, OAI-PMH versions 1.0, 1.1… –OAI Protocol for Metadata Harvesting 2.0 development steered by international technical committee inter-version stability helped developer confidence move from focus on eprints to more generic protocol –move from OAI-specific metadata schema to mandatory support for DC

8 8 Bluffers guide to OAI 1.OAI-PMH is a low-cost mechanism for harvesting metadata records –from data providers to service providers 2.allows service provider to say give me some or all of your metadata records –where some is based on date-stamps, sets, metadata formats 3.not limited to repositories of eprints –images, museum artefacts, learning objects, … 4.based on HTTP and XML –simple, Web-friendly, autonomous –fast, flexible deployment http://www.openarchives.org/

9 9 Bluffers guide to OAI 5.OAI-PMH is not a search protocol –but use can underpin search-based services based on Z39.50 or SRW or SOAP or… 6.OAI-PMH carries only metadata –content (e.g. full-text or image) made available separately – typically at URL in metadata 7.mandates simple DC as record format –but extensible to any XML format – IMS, ONIX, MARC, METS, etc. 8.extensible framework for metadata about –repository, resources, items, sets –can include rights metadata

10 10 Bluffers guide to OAI 9.metadata and content often made freely available – but not a requirement –OAI-PMH can be used between closed groups –or, can make metadata available but restrict access to content in some way 10.underlying HTTP protocol provides –access control – e.g. HTTP BASIC –compression mechanisms (for improving performance of harvesters) –could, in theory, also provide encryption if required

11 11 Resources, items and records resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier

12 12 Protocol requests six different request types –Identify –ListMetadataFormats –ListSets –ListIdentifiers –ListRecords –GetRecord harvester need not use all types repository must implement all types required and optional arguments –on request types

13 13 Record structure metadata about a resource in a particular XML format header (mandatory) identifier (1) datestamp (1) setSpec elements (*) status attribute for deleted item (?) metadata (mandatory) XML encoded metadata within root tag which provides namespace and schema repositories must support Dublin Core about (optional) rights statements provenance statements

14 14 Dublin Core OAI-PMH mandates use of simple DC as lowest common denominator agreed XML schema – oai_dc –simple DC – 15 metadata properties –all DC properties optional and repeatable TitleContributorSource CreatorDateLanguage SubjectTypeRelation DescriptionFormatCoverage PublisherIdentifierRights http://dublincore.org/

15 15 OAI demonstration repository explorer demo

16 16 OAI and Google Web site(s) multimedia database(s) DP9 gateway OAI gateway makes harvested metadata available to Google… eprint archive(s)

17 17 Implementing OAI OAI protocol is relatively simple implementation and deployment tends to be very fast lots of available toolkits –Java, Perl, PHP, etc. complete tools also available –e.g. tools that sit in front of existing databases see tools area on the OAI Web site…

18 18 Creative Commons CC is devoted to expanding the range of creative work available for others to build upon and share provides standard licences for content –attribution –noncommercial –no derivative works –share alike mechanisms for indicating licence on Web pages need similar mechanism in OAI http://www.creativecommons.org/

19 19 Questions…

20 a centre of expertise in digital information management www.ukoln.ac.uk


Download ppt "A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,"

Similar presentations


Ads by Google