Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

A busy persons introduction to OAI-PMH Christopher Gutteridge ALT, April 2003.
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
OAI in DigiTool DigiTool Version 3.0.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Using Metadata in CONTENTdm Diana Brooking and Allen Maberry Metadata Implementation Group, Univ. of Washington Crossing Organizational Boundaries Oct.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM April 16, 2002.
University of Michigan’s OAI Metadata Harvesting Project Kat Hagedorn OAIster Librarian, UM May 12, 2002.
UCLA Digital Library UC Digital Library Forum August 5, 2002 UCLA Digital Library Presenter: Curtis Fornadley Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative and OAIster: Past, Present and Future Kat Hagedorn University of Michigan Libraries April 6, 2006.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
NAL-Institutional Repository: A Case Study CSIR Metadata Harvester I.R.N. Goudar Head, ICAST, NAL National Symposium on Open Access and.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
XML: The Strategic Opportunity Roy Tennant Challenges*  Only librarians like to search, everyone else likes to find  Our users want more information.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata Harvesting Interoperable digital collections.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Getting Started with CONTENTdm Corey Harper, University of Oregon Terry Reese, Oregon State University OLA - April 8, 2005.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
LIS 654 BUILDING DIGITAL LIBRARIES FALL 2011 NOVEMBER 03, 2011 The OAI-PMH Harvester Plugin for The Omeka Content Management System JAMES R. GRIFFIN III.
Semantics and Syntax of Dublin Core Usage in Open Archives Initiative Data Providers of Cultural Heritage Materials Arwen Hutt, University of Tennessee.
OAI-PMH The Open Archives Initiative Protocol for Metadata Harvesting Presenter: Knud Möller Friday,
Creating an Open Archives Metadata Harvesting Protocol Compliant Repository for the American Memory Online Collections OAI Open Meeting, Washington, DC.
1 OAI-PMH harvester for agricultural knowledge gathering (Development, testing and implementation) Francesco Castellani and Stefka Kaloyanova 4 February.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Metadata and Documentation Iain Wallace Performing Arts Data Service.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Best Practices for Digital Imaging and Metadata Roy Tennant The Library, University of California, Berkeley
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
SPASE and the VxOs Jim Thieman Todd King Aaron Roberts.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Best Practices for OAI: A Status Report Kat Hagedorn Sarah Shreeves DLF Spring Forum San Diego, CA April
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
Breaking Out of the Box: Creating Customized Metasearch Services Using an XML API Roy Tennant, California Digital Library.
DSpace - Digital Library Software
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Standards OAI-Protocol Metadata: DC - Agris - MODS Marc Goovaerts Hasselt University Library ODIN-PI TRAINING OSTENDE, May 2008.
Designing Protocols in Support of Digital Library Componentization Hussein Suleman and Edward A. Fox Digital Library Research Laboratory Virginia Tech.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Metadata & Repositories Jackie Knowles RSP Support Officer.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Open your Alfresco Data
Getting a Leg Up on OAI for the NSDL
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
OAI and Metadata Harvesting
OAI 11/20/07.
Open Archive Initiative
IVOA Interoperability Meeting - Boston
Presentation transcript:

Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library

Outline The Open Archives Initiative OAI-PMH The Harvesting Process Harvesting Problems Steps to a Fruitful Harvest A Harvesting Service Model The OAI Future

Open Archives Initiative Aimed at making the large and growing number of repositories of freely available digital content interoperable Protocol for Metadata Harvesting (OAI-PMH) specifies how repositories can expose their metadata for others to harvest Over 800 repositories world-wide support the protocol OAIster.org has indexed nearly 6 million items from over 500 of those repositories

OAI-PMH Data providers (DP) — those with the stuff Service providers (SP) — those who harvest metadata and provide aggregation and search services Software for both DPs and SPs readily available OAI-PMH verbs: Identify ListIdentifiers ListMetadataFormats ListSets ListRecords GetRecord

OAI Architecture Source: Open Archives Forum Tutorial

Identify Provides basic information about a repository

ListMetadataFormats Lists available metadata formats

ListIdentifiers Lists all identifiers (or only those of the optionally specified set) Must include metadataPrefix attribute

ListSets Lists available sets

Library of Congress ListSets response

ListRecords Lists all records (or only those of the optionally specified set) Must include metadataPrefix attribute

GetRecord Retrieves a specific record Must include metadataPrefix and identifier attributes

The Harvesting Process Identifying Sources Selecting Sets Harvesting Metadata Processing Indexing Interface

A Harvesting Service Model

gita.grainger.uiuc.edu/registry/

errol.oclc.org

Selecting Sets Review the response to the ListSets verb May be instructive to search the collection in the native interface, if possible Look for descriptive pages on the site being harvested

Harvesting Many harvesting applications are available, I will focus on: Public Knowledge Project (PKP) Harvester Virginia Tech Perl Harvester Library software vendors increasingly offer harvesting products (e.g., ExLibris’ MetaIndex)

| Harvester Sample Configurator | | Version 1.1 :: July 2002 | | Hussein Suleman | | Digital Library Research Laboratory | | :: Virginia Tech | Defaults/previous values are in brackets - press to accept those enter "&delete" to erase a default value enter "&continue" to skip further questions and use all defaults press -c to escape at any time (new values will be lost) Press to continue [ARCHIVES] Add all the archives that should be harvested Current list of archives: No archives currently defined ! Select from: [A]dd [D]one Enter your choice [D] : a{return} [ARCHIVE IDENTIFIER] You need a unique name by which to refer to the archive you will harvest metadata from Examples: nsdl , VTETD Archive identifier [] : nsdl {return} Virginia Tech Perl Harvester

Let’s Harvest!

Indexing Pick your favorite database/indexing software: MySQL SWISH-E Whatever is lying around… May need to specifically set up a method to search across the entire record May need different fields for indexing than for display Will need to deal with element collision

Interface Software interface (API) for other applications: SRU/SRW? MXG? Arbitrary Web Services schema? User interface: What functions do you want your users to be able to perform? What kinds of displays do you want?

Harvesting Problems Sets Metadata Formats Metadata Artifacts Granularity Metadata Variances

Sets Records are harvested in clumps, called “sets” created by DPs No guidelines exist for defining sets Examples: Collection Organizational structure Format (but is a page image an image? See example)

Metadata Formats Only required format is simple Dublin Core, although any format can be made available in addition Few DPs surface richer metadata Simple DC is simply too simple! Example (artifact vs. surrogate dates)

Metadata Artifacts “unintended, unwanted aberrations” Sample causes: Idiosyncratic local practices Anachronisms HTML code Examples: Circa = string of dates for searching purposes [electronic resource]

Granularity Record Granularity: what is an “object”? A book, or each individual page? Examples: CDL, Univ. of Michigan Metadata Granularity: Multiple values in one field Example: Univ. of Washington

Metadata Variances Subject terminology differences Disparities in recording the same metadata Example: date variances Mapping oddities or mistakes Examples: 1) format into description, 2) description into subject

Steps to a Fruitful Harvest Needs Assessment (it’s the user, stupid) DP Identification and Communication Metadata Capture Metadata Analysis Metadata Subsetting Metadata Normalization Metadata Enrichment Indexing & Display Interface (it’s still the user, stupid)

Needs Assessment What are you trying to accomplish? What will your users want to be able to do? What metadata will you need, and what procedures will you need to set up to enable these activities? Which repositories have what you want? Is what they have (e.g., sets, metadata) usable as is, or ?

DP Identification & Communication Identification: Use UIUC directory of DPs to identify potential sources Communication: Not required to tell them you are harvesting, but may help establish a good relationship May want to request that they surface a richer metadata format and/or provide a different set

Metadata Capture Sample questions to answer: Individual sets, or all? Richer metadata formats available? How frequently to reharvest? Start from scratch each time or update? Many software options

Metadata Analysis Finding out what you have (and don’t have) Encoding practices Gap analysis (e.g., missing fields, etc.) Mistakes (e.g., mapping errors) Software can help Commercial software like Spotfire In-house or open source software tools

Source: 2002 Master’s Thesis, Jewel Hope Ward, UNC Chapel Hill Five elements are used 71% of the time

Metadata Subsetting DP sets are unlikely to serve all SP uses well SPs will need the ability to subset harvested metadata

Metadata Normalization Normalizing: to reduce to a standard or normal state Prototype date normalization service screen

Metadata Enrichment Adding fields and/or qualifiers may be useful or required, for example: Metadata provider information Geographic coverage Subject terms mapped to a different thesaurus Authority control record The enrichment process may be the same tool as the subsetting tool (i.e., find a cluster of records and perform an action)

Indexing & Display Selected fields may need to be mapped to specific indexing and display elements Particularly required if harvesting different metadata formats But also needs to be done with multiple, conflicting fields: [2001 or 2002.] SHS 1,

A Harvesting Service Model

The OAI Future Further protocol development Services layered on top of OAI-PMH Shared software tools Best practices for both DPs and SPs

oai-best.comm.nsdl.org