The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
THE DONOR PROJECT Titia van der Werf-Davelaar. Project Financed by: Innovation of Scientific Information Provision (IWI) Duration: –phase 1: 1 may 1998.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Setting the Stage Provide a high-level overview of the accessioning and management processes Depict where/how DLESE tools are used in the processes Identify.
OAI in DigiTool DigiTool Version 3.0.
Harvesting Metadata Using OAI-PMH Roy Tennant California Digital Library.
ComPADRE Experiences developing an OAI server over an existing database repository Resources for Physics and Astronomy Education Lyle Barbato American.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Infrastructures for Using Metadata RSS and OAI-PMH CS 431 – March 14, 2005 Carl Lagoze – Cornell University.
National Science Digital Library (NSDL) Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University.
Metadata and Information Visualization Naomi Dushay Cornell Information Science National Science Digital Library.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
Making Metadata Work for the NSDL. Starting from Sept with...  A prototype with not much behind it that was re-usable (
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
A Digital Library Repository Utilizing the Open Archives Initiative Developed to meet the needs of UTK Library Special Collections.
OCLC Online Computer Library Center Two Paths to Interoperable Metadata Jean Godby, Devon Smith, Eric Childress DC-2003 September 29, 2003.
Thomas G. Habing – University of Illinois at Urbana-Champaign Recap: SIGIR 2001 OAI Workshop 19 September OAI Provider Workshop, University of.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
Metadata Harvesting Interoperable digital collections.
Open Archives Iniative – Protocol for Metadata Harvesting Iztok Kavkler, University of Ljubljana Some slides by Stefaan Ternier, KUL Bram Vandenputte,
Metadata Harvesting Interoperable digital collections.
Metadata Harvesting Interoperable digital collections.
Improving Metadata Quality: Augmentation and Recombination Diane I. Hillmann Naomi Dushay Jon Phipps National Science Digital Library.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
IESR Interfaces: Current Services and Future Plans Ann Apps MIMAS, The University of Manchester, UK.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
The OAI Protocol for Metadata Harvesting Van de Sompel, Herbert Los Alamos National Laboratory – Research Library.
Metadata harvesting in regional digital libraries in PIONIER Network Cezary Mazurek, Maciej Stroiński, Marcin Werla, Jan Węglarz.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
The NCAR Community Data Portal (CDP) Experiences with OAI metadata record federation presented by Michael Burek (NCAR/SCD/VETS) Acknowledgments:
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Bitter Harvest Metadata Harvesting Issues, Problems, and Possible Solutions Roy Tennant California Digital Library.
CNI, 4th April 2006 Slide 1 Key Standards Update: SRU (“Technical” Details) Dr. Robert Sanderson Dept. of Computer Science University of Liverpool
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
OAI Tools By Thomas G. Habing Grainger Engineering Library Information Center University.
Metadata Harvesting Interoperable digital collections.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Student Centered ODS ETL Processing. Insert Search for rows not previously in the database within a snapshot type for a specific subject and year Search.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
Introduction to the OAI Protocol for Metadata Harvesting Version 2.0 Hussein Suleman Virginia Tech DLRL 25 March 2002.
DC Architecture WG meeting Wednesday Seminar Room: 5205 (2nd Floor)
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Describing resources II: Dublin Core CERN-UNESCO School on Digital Libraries Rabat, Nov 22-26, 2010 Annette Holtkamp CERN.
Collection Consultation: Advice & Tools for Contributing to NSDL NSDL Annual Meeting November 6-8, 2007.
OAI and ODL Building Digital Libraries from Components Hussein Suleman Virginia Tech DLRL 12 September 2002.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
Harvesting and Exporting Metadata 714: Metadata Margaret E.I. Kipp -
Introduction to OAI Static Repositories By Thomas G. Habing Grainger Engineering Library.
Getting a Leg Up on OAI for the NSDL
Georges Arnaout Chaitanya Krishna
XML Schemas for Dublin Core Metadata
OAI and Metadata Harvesting
Open Archive Initiative
WebDAV Design Overview
IVOA Interoperability Meeting - Boston
Presentation transcript:

The NSDL, OAI and Your Metadata Core Infrastructure Metadata Repository (“union catalog”) Naomi Dushay Cornell University

OAI in the NSDL Infrastructure Your collection’s metadata Your collection’s OAI server NSDL MR OAI server NSDL Search Service NSDL Archive Service other OAI Services Your collection’s metadata,scrubbed & normalized NSDL Metadata Repository (MR)

The Metadata Repository Designed to be scaleable Designed to be scaleable Based on automated harvest/expose model, with OAI at each end Based on automated harvest/expose model, with OAI at each end A notion of “normalized” metadata with Qualified Dublin Core as its base A notion of “normalized” metadata with Qualified Dublin Core as its base

Why do we normalize metadata? Improve services (e.g. search results, or UI display) Improve services (e.g. search results, or UI display) Improve metadata quality, when possible Improve metadata quality, when possible Enhance predictability of data for reharvesting services Enhance predictability of data for reharvesting services

How do we normalize metadata? Perform “safe” transforms to “smarten up” metadata Perform “safe” transforms to “smarten up” metadata XSL stylesheets -- from your XML metadata to our normalized XML metadata XSL stylesheets -- from your XML metadata to our normalized XML metadata Principles: Principles: Do no harm (Don’t lose information) Do no harm (Don’t lose information) Add information, when possible Add information, when possible Indicate schemes for valid values Indicate schemes for valid values Remove meaningless text Remove meaningless text “…”, “not available”, “-” “…”, “not available”, “-” Empty elements Empty elements Correct wrong information Correct wrong information “text/pdf”  “application/pdf” “text/pdf”  “application/pdf” Remove characters that impede functionality or display Remove characters that impede functionality or display Encoding fixes (e.g. “&”, double XML encodings, bad UTF-8 …) Encoding fixes (e.g. “&”, double XML encodings, bad UTF-8 …) Scrub URLs Scrub URLs

Automated MR Ingest process Your collection info and harvesting info is registered Your collection info and harvesting info is registered OAI validation – can we run our harvester on your OAI server? (see handout) OAI validation – can we run our harvester on your OAI server? (see handout) OAI harvest of your metadata (nsdl_dc if available; oai_dc if not...) OAI harvest of your metadata (nsdl_dc if available; oai_dc if not...) XML schema validation of all of your metadata XML schema validation of all of your metadata UTF-8 encoding validation, and make bad UTF-8 chars into harmless ones. UTF-8 encoding validation, and make bad UTF-8 chars into harmless ones. Normalized nsdl_dc created. Normalized nsdl_dc created. Your metadata, “raw” and normalized, is loaded into the MR tables and made available to the NSDL’s MR OAI server. Your metadata, “raw” and normalized, is loaded into the MR tables and made available to the NSDL’s MR OAI server.

Automated MR ingest process NSDL Collection Registration “raw” or “native” metadata Validation Notify collection of problems; May need to halt processing Metadata Repository Your collection’s OAI server NSDL MR OAI server OAI Harvest Normalize Validation normalized metadata

OAI-PMH: Key points OAI-PMH requests are embedded in HTTP OAI-PMH requests are embedded in HTTP it’s a web service, not a flat file it’s a web service, not a flat file XML, not HTML XML, not HTML multiple metadata formats are allowed multiple metadata formats are allowed OAI ≠ simple DC only! OAI ≠ simple DC only! Each metadata format MUST have a valid XML schema Each metadata format MUST have a valid XML schema

Metadata Formats and Schemas XML namespace XML Schema location OAI metadataPrefix Simple Dublin Core, OAI flavor chives.org/OAI/2.0 /oai_dc/ hives.org/OAI/2.0/o ai_dc.xsd oai_dc Qualified Dublin Core, latest NSDL flavor nsdl_dc_v1.02/ chemas/nsdl_dc/ns dl_dc_v1.02.xsd (As you like; We use “nsdl_dc”) Your format (An appropriate URI) (URL for an XML schema) (As you like)

MR ingest requires: compliant OAI 2.0 server Correctly implements OAI-PMH; queries to all verbs respond correctly. Correctly implements OAI-PMH; queries to all verbs respond correctly. Every OAI response must be (deeply) XML schema valid Every OAI response must be (deeply) XML schema valid Encodes properly in proper places Encodes properly in proper places XML encoding XML encoding URL encoding URL encoding UTF-8 encoding UTF-8 encoding

OAI 2.0 – Identify baseURL baseURL address address protocol version protocol version description for OAI identifier syntax, especially if adhering to oai-identifier syntax described in Implementation Guidelines description for OAI identifier syntax, especially if adhering to oai-identifier syntax described in Implementation Guidelines

OAI 2.0 – ListMetadataFormats correct XML namespace for each format correct XML namespace for each format a valid XML schema for each format a valid XML schema for each format targetNamespace MUST match XML namespace above targetNamespace MUST match XML namespace above super easy out: use oai_dc super easy out: use oai_dc easy out: use nsdl_dc easy out: use nsdl_dc

OAI 2.0 – ListSets super easy out: if all your metadata is NSDL relevant, don’t use sets for our sake. super easy out: if all your metadata is NSDL relevant, don’t use sets for our sake. if you want the NSDL to harvest only SOME of your OAI server’s metadata, then use sets. if you want the NSDL to harvest only SOME of your OAI server’s metadata, then use sets. We will harvest only the sets you specify … but our default is to harvest all of them. We will harvest only the sets you specify … but our default is to harvest all of them. super easy setSpec strings: use only alpha-num characters super easy setSpec strings: use only alpha-num characters

OAI 2.0 – ListRecords Every metadata record served must (deeply) validate to its indicated XML schema Every metadata record served must (deeply) validate to its indicated XML schema If used, resumptionTokens must be implemented properly If used, resumptionTokens must be implemented properly RT is an exclusive argument RT is an exclusive argument Last response has an empty RT Last response has an empty RT Selective Harvesting works properly Selective Harvesting works properly “from” and “until” arguments do limit the results appropriately “from” and “until” arguments do limit the results appropriately “set” arguments do limit the results appropriately, if implemented “set” arguments do limit the results appropriately, if implemented

Common Points of Confusion - 1 about the metadata vs. about the resource identifiers: OAI vs. DC identifiers: OAI vs. DC record/header/identifier vs. record/metadata/../dc:identifier record/header/identifier vs. record/metadata/../dc:identifier dates: OAI vs. DC dates: OAI vs. DC record/header/datestamp vs. record/metadata/../dc:date record/header/datestamp vs. record/metadata/../dc:date OAI about containers are about the metadata OAI about containers are about the metadata rights: OAI about vs. DC rights: OAI about vs. DC record/about/../(dc:rights?) vs. record/metadata/../dc:rights record/about/../(dc:rights?) vs. record/metadata/../dc:rights

OAI identifiers Must uniquely identify individual metadata records at your site for OAI harvest and OAI reharvest Must uniquely identify individual metadata records at your site for OAI harvest and OAI reharvest Must stay the same for your metadata records Must stay the same for your metadata records metadata is updated; OAI identifier unchanged metadata is updated; OAI identifier unchanged

Common Points of Confusion - 2 Dates Dates format confusion format confusion OAI dates must be encoded as ISO8601 and must be in UTC (≈ GMT) OAI dates must be encoded as ISO8601 and must be in UTC (≈ GMT) OAI-PMH allows YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. OAI-PMH allows YYYY-MM-DD and YYYY-MM-DDThh:mm:ssZ. DC date encoding – “Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.” DC date encoding – “Recommended best practice for encoding the date value is defined in a profile of ISO 8601 [W3CDTF] and follows the YYYY-MM-DD format.” (All OAI-PMH responses) (All OAI-PMH responses) Time when OAI server responds to a request Time when OAI server responds to a request OAI-PMH sez: ‘must be the time and date of the response in UTC. This is encoded using the "Complete date plus hours, minutes, and seconds" variant of ISO8601. This format is YYYY- MM-DDThh:mm:ssZ.’ OAI-PMH sez: ‘must be the time and date of the response in UTC. This is encoded using the "Complete date plus hours, minutes, and seconds" variant of ISO8601. This format is YYYY- MM-DDThh:mm:ssZ.’ISO8601 (OAI-PMH / ) (OAI-PMH / ) “from” and “until” arguments in OAI requests “from” and “until” arguments in OAI requests

When a Collection Deletes Records When a Collection Deletes Records if not indicated in OAI server if not indicated in OAI server incremental harvest for MR never shows update; MR copy never deleted! incremental harvest for MR never shows update; MR copy never deleted! if indicated in OAI server transiently if indicated in OAI server transiently reharvested soon enough – reharvested soon enough – not reharvested soon enough – incremental harvest for MR never shows update; MR copy never deleted! not reharvested soon enough – incremental harvest for MR never shows update; MR copy never deleted! if OAI server indicated and persistent if OAI server indicated and persistent MR finds delete on incremental harvest – MR finds delete on incremental harvest –

Deleted Records – Our Solution “Full reharvest” “Full reharvest” 1. Mark all the site’s records in MR “deleted” 2. Harvest all metadata records for the collection 3. As we ingest each newly retrieved record into the MR, if we over-write an old record, “un- delete” it. Expensive Expensive network bandwidth network bandwidth processing time processing time Okay for small collections (under ~15,000) Okay for small collections (under ~15,000) Okay for metadata that changes infrequently Okay for metadata that changes infrequently

In an ideal world, we’d like nsdl_dc nsdl_dc Information about nsdl_dc, example records and its XML schemas is in the NSDL Metadata Primer. Information about nsdl_dc, example records and its XML schemas is in the NSDL Metadata Primer. Persistent deleted records Persistent deleted records OAI identifier syntax, per OAI Implementation Guidelines OAI identifier syntax, per OAI Implementation Guidelines