Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003.

Slides:



Advertisements
Similar presentations
Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
Advertisements

OAI from 50,000 Feet OAI develops and promotes interoperability solutions that aim to facilitate the efficient dissemination of content. Begun in 1999.
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
OAForum – September 2003 Muriel Foulonneau Open Archives Initiatives Protocol for Metadata Harvesting Practices for the cultural heritage sector Muriel.
A centre of expertise in digital information management IMS Digital Repositories Interoperability Andy Powell UKOLN,
DELOS Highlights COSTANTINO THANOS ITALIAN NATIONAL RESEARCH COUNCIL.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
Andy Powell, Eduserv Foundation July 2006 Repository Roadmap – technical issues.
Collections and services in the information environment JISC Collection/Service Description Workshop, London, 11 July 2002 Pete Johnston UKOLN, University.
DuraSpace: Digital Information All Ways, Always Pretoria, South Africa May 14 th, 2009.
Multi-Mode Survey Management An Approach to Addressing its Challenges
1 Building the NSDL William Y. Arms Cornell University Thinking aloud about the NSDL.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
Building Reliable Distributed Information Spaces Carl Lagoze CS /22/2002.
Web Crawling/Collection Aggregation CS431, Spring 2004, Carl Lagoze April 5 Lecture 19.
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
Mixed content, mixed metadata: Information discovery in the NSDL.
The NSDL Registry Diane Hillmann  Jon Phipps. What We’re Doing Received an NSF grant in Oct. 2006, to: Register metadata schemas, vocabularies, application.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
1 An introduction to the NSDL William Y. Arms Cornell University.
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
Metadata: Its Functions in Knowledge Representation for Digital Collections 1 Summary.
Introduction to Digital Libraries hussein suleman uct cs honours 2004.
CONTI’2008, 5-6 June 2008, TIMISOARA 1 Towards a digital content management system Gheorghe Sebestyen-Pal, Tünde Bálint, Bogdan Moscaliuc, Agnes Sebestyen-Pal.
Teaching Metadata and Networked Information Organization & Retrieval The UNT SLIS Experience William E. Moen School of Library and Information Sciences.
1 The NSDL: A Case Study in Interoperability William Y. Arms Cornell University.
How to participate in the Union Catalogue Project Hussein Suleman Sivulile – Open Access South Africa Advanced Information Management.
Metadata: An Overview Katie Dunn Technology & Metadata Librarian
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
Fedora Content Models for the National Science Digital Library Data Repository Fedora User’s Group Meeting Copenhagen, September 28, 2005 Carl Lagoze Cornell.
NSDL: OAI and a large- scale digital library Carl Lagoze, Cornell University NSDL Director of Technology
Building a large-scale digital library for education Carl Lagoze Common Solutions Group January 16, 2003.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
A centre of expertise in digital information management RDN, e-Prints UK and NOF- Digitise: a (very) small sample of UK OAI activity Andy.
The Resource Discovery Network and OAI Andy Powell UKOLN, University of Bath UKOLN is funded by Resource: The Council.
1 CS 430 / INFO 430 Information Retrieval Lecture 24 Architecture of Information Retrieval Systems.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
Extending Access To Information Resource Discovery Service William E. Moen, Ph.D. Kathleen R. Murray, Ph.D. School of Library and Information Sciences.
1 A Very Large Digital Library Technology Demonstration William Y. Arms Cornell University.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
1 The NSDL Program Stephen Griffin National Science Foundation.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
Metadata and OAI DLESE OAI Workshop April 29-30, 2002 Katy Ginger Presentation available at:
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Metadata and OAI DLESE OAI Workshop June 29 to July 2, 2002 Katy Ginger Presentation available at:
NSDL & Access Management David Millman Columbia University Jan ‘02.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
A Resource Discovery Service for the Library of Texas Requirements, Architecture, and Interoperability Testing William E. Moen, Ph.D. Principal Investigator.
Cornell CS 502 Metadata for the Web Issues and Simple Answers CS 502 – Carl Lagoze – Cornell University.
DLF Fall Forum The Distributed Library: OAI for Digital Library Aggregation UIUC’s Role: Registry of OAI Data Providers
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
OAI metadata: why and how Jenn Riley Metadata Librarian Indiana University.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems.
1 CS 430: Information Discovery Lecture 13 Case Study: the NSDL.
Getting a Leg Up on OAI for the NSDL
Joseph JaJa, Mike Smorul, and Sangchul Song
NSDL: OAI and a large-scale digital library
An Architecture for Complex Objects and their Relationships
Outline Pursue Interoperability: Digital Libraries
OAI and Metadata Harvesting
NSDL Data Repository (NDR)
Building a large-scale digital library for education
JISC Information Environment Service Registry (IESR)
Presentation transcript:

Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003

Where we are now De facto standard for Internet information exchange Deployed extensively and internationally –(digital) libraries –Museums –Eprint repositories –Research projects

Protocol Stability OAI-PMH has been stable since release –No functional changes, just typographic edits –Validation of leadership/participation model No plans for a 3.0 release –Core protocol will not be extended –Minor 2.x release could occur (more later) –Additional implementation guidelines (more later)

NSDL and OAI-PMH

The NSDL Context National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library Major National Science Foundation project targeted at the application of web and Internet to (STEM) education $25M over six years to over 100 projects –Collections –Services –Targeted Research –Core Integration

Aggregation rather than collection –Core integration team will not manage any collections Spectrum of interoperability –Accommodate diversity of participation models –Open interfaces and standards permitting plug in of array of value-added services One library many portals –Accommodate multiple quality and selection metrics –Tailor presentation of content and nature of services to audience needs NSDL technical guidelines

LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do not Web crawlers cooperate; services mustand search engines seek out information Spectrum of interoperability

This is a big task that no one has done before! Work on the priorities –Focus on one point on spectrum of interoperability Metadata harvesting Incorporate NSF funded collections and selected other collections –Leverage existing (or at least emerging) technologies and protocols OAI, uPortal, Shibboleth, SDLIP, InQuery –Provide reliable base level services Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence Plant some seeds for the future –Machine-assisted metadata generation –Automated collection aggregation –Web gathering strategies Translating to initial goals

Central storage of all metadata about all resources in the NSDL –Defines the extent of NSDL collection –Metadata includes collections, items, annotations, etc. MR main functions –Aggregation –Normalization –redistribution Ingest of metadata by various means –Harvesting, manual, automatic, cross-walking Open access to MR contents for service builders via OAI-PMH Metadata Repository

Metadata Strategy Collect and redistribute any native (XML) metadata format Provide crosswalks to Dublin Core from standard formats –DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD Concentrate on collection-level metadata Use automatic generation to augment item-level metadata

Importing metadata into the MR Collections Harvest Staging area Cleanup and crosswalks Database load Metadata Repository

Exporting metadata from the MR

NSDL and OAI-PMH Two years later Concepts are good, practice is hard Issues –Metadata is hard –XML is hard –Protocols are hard Static repositories (more later) –IP is relevant (more later)

Some Essential Metadata Questions Review original (DC) metadata assumptions –Metadata is essential for good resource discovery –Joe Sixpack could create metadata Account for current realities –2003 is not 1994 –Google, etc. keeps getting better

Metadata Space

Metadata Triage

Reconsidering the Dublin Core Requirement Questions about utility of unqualified DC –The conundrum…. Specification too loose to serve intended interoperability goal But more complex metadata may be too hard Limited energy for interoperability –Data providers implement required DC at expense of better metadata Use of protocol for purposes other than resource discovery

Rethinking record-oriented model Implications for record-oriented harvesting????

Topology Evolution Simple Data Provider, Service Provider Topology

Topology Evolution (cont.) Metadata Aggregator

Topology Evolution (cont.) OAI-PMH p2p network

OAI-P 2p MH Issues Document (metadata) location –Exploit unique identifiers, use efficient key-based location mechanisms (distributed hash tables) Provenance-based queries –Metadata records may go through refinement and/or translation phases as they move through value-added aggregators. –Exploit provenance guidelines Network harvesting –Broadcast query (Gnutella) inefficient –Exploit techniques for efficient routing of queries (P- trees)

OAI-PMH and Intellectual Property Protocol exists in a context where information providers have concerns about use of intellectual property OAI-PMH is nominally about metadata, but… –Rich metadata is an intellectual product –The protocol can be used to transmit anything (e.g. content) that can be encoded in XML –Generally metadata leads to content so….

OAI-rights effort Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework. The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. –No changes to core protocol

OAI-rights Effort (cont.) Extensible, providing a general framework for expressing rights statements within OAI-PMH. –Not an effort to develop a new rights expression language Use Creative Commons licenses as a motivating and deployable example. Release of specification by 2 nd quarter 04 Invited OAI-rights group –Standard OAI development model

Dimensions of OAI-PMH and rights Entity Association Metadata: concern in NSDL for (re)use of rich metadata Content: predominant application of the protocol to resource discovery and ultimate access makes this important

Dimensions of OAI-PMH and rights Aggregation Association OAI-PMH aggregations –Repository –Set –Item Rights association with an aggregation may provide shortcut (e.g., the rights for all resources in a repository/set…) Cost of shortcut is pseudo-statefulness, possibly complex overriding rules

Dimensions of OAI-PMH and rights Binding Choices –exploit mechanisms in metadata formats e.g., DC- rights –restrict the rights statements to some more specific protocol mechanism –allow some mixture of these methods. DC-rights problems –Semantics is restricted to rights about resource –Cant embed XML in dc value –What if DC is not required Burden on harvesters if rights embedding is not explicit but scattered across several locations

OAI-PMH Static Repositories Provide a lightweight mechanism for data provider participation Intended for relatively small and static collections Two components –Static Repository XML format Semantically equivalent to Identify and ListRecords Invisible to harvester –Static Repository Gateway Virtual data provider for static repository data Unique baseURL for each contained static repository

Static Repositories and Static Repository Gateway

Static Repositories Open Issue Relationship to RSS?????

Conclusions Interoperability and lowest common denominator Rapid advances automated methods –Moores law –Smart algorithms –Benefits of issues of scale Combining human effort and automated methods –Extracting order from chaos –Learning from order Move beyond resource discovery

Typical Values repository –collection of publications resource –scholarly publication item –all metadata (DC + MARC) record –a single metadata format datestamp –last update / addition of a record metadata format –bibliographic metadata format set –originating institution or subject categories

Repositories… Stretching the idea of a repository a bit: –contextually sensitive repositories personalization for harvesters communication between strangers, or communication between friends? –OAI-PMH for individual complex objects? OAI-PMH without MySQL?! –Fedora, Multi-valent documents, buckets –tar, jar, zip, etc. files

Resource What if resource were: –computer system status uptime, who, w, df, ps, etc. –or generalized system status e.g., sports league standings –people personnel databases authority files for authors

Item What if item were: –software union of versions + formats –all forms of metadata administrative + structural citations, annotations, reviews, etc. –data e.g., newsfeeds and other XML expressible content –metadataPrefixes or sets could be defined to be different versions

Record What if record were: –specific software instantiations / updates –access / retrieval logs for DLs (or computer systems) –push / pull model inversion put a harvester on the client behind a firewall, the client contacts a DP and receives instructions on how to submit the desired document (e.g., send to a specified address)

Datestamp semantics of datestamp are strongly influenced by the choice of resource / item / record / metadataPrefix, but it could be used to: –signify change of set membership (e.g., workflow: item moves from submitted to approved) –change datestamp to reflect access to the DP e.g., in conjunction with metadataPrefixes of accessed or mirrored

metadataPrefix what if metadataPrefix were: –instructions for extracting / archiving / scraping the resource verb=ListRecords&metadataPrefix=extract_TIFFs –code fragments to run locally (harvested from a trusted source!) –XSLT for other metadataPrefixes branding container is at the repository-level, this could be record- or item-level

Set sets are already used for tunneling OAI- PMH extensions (see Suleman & Fox, D-Lib 7(12)) other uses: –in aggregators, automatically create 1 set per baseURL –have hidden sets (or metadataPrefix) that have administrative or community-specific values (or triggers) set=accessed>1000&from= set=harvestMeWithTheseARGS&until= &metadataPrefix=oai_marc