Institutional Archives Technology Overview Michael L. Nelson Old Dominion University Institutional Archives.

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
Advertisements

LOCALIZED REFERENCE LINKING PROJECT Dale Flecker NFAIS/NISO Linking Workshop February 24, 2002 Philadelphia.
A brief overview of the Open Archives Initiative Steve Hitchcock Open Citation Project (OpCit) Southampton University Prepared for Z39.50/OAI/OpenURL plenary.
UKOLN is supported by: An overview of the OpenURL UKOLN/JIBS OpenURL Meeting London, September 2003 Andy Powell, UKOLN, University of Bath
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
1. The Digital Library Challenge The Hybrid Library Today’s information resources collections are “hybrid” Combinations of - paper and digital format.
Service Providers: Future Perspectives Michael L. Nelson Old Dominion University Norfolk Virginia, USA 2nd Workshop.
Service Providers: Future Perspectives Michael L. Nelson Old Dominion University Norfolk Virginia, USA
Extended-Linking Services: towards a Quality Web Eric F. Van de Velde California Institute of Technology
Extended-Linking Services: towards a Quality Web Eric F. Van de Velde California Institute of Technology Oren Beit-Arie Ex Libris.
Developments in Linking: OpenURL Eric F. Van de Velde California Institute of Technology
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
The Open Archives Initiative Simeon Warner Cornell University, Ithaca, NY, USA CREPUQ 2002, Montréal, Canada 14:00, 24 October 2002.
National Aeronautics and Space Administration Implementing DSpace at NASA Langley Research Center 1 Greta Lowe Librarian NASA Langley Research Center
Digital Library Architecture and Technology
Dienst Distributed Networked Publishing Carl Lagoze Digital Library Scientist Cornell University.
Implementation of Digital Libraries Michael L. Nelson Old Dominion University Congreso Internacional de Información.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
OAI: Past, Present and Future Michael L. Nelson several slides stolen from Herbert Van de Sompel Open Archives Meeting Institute of Mechanical.
WHY LIBRARIES WILL CARE HOW LINKING WORKS... November, 2000.
A Review of Institutional Repository Projects and Technologies Michael L. Nelson Old Dominion University Texas.
Localized Linking Prototype CNI April 10, 2001 Dale Flecker, Larry Lannom, Rick Luce, Bill Mischo, Ed Pentz.
Linking resources Praha, June 2001 Ole Husby, BIBSYS
What is an Open URL? It is a draft National Information Standards Organization standard: NISO Z the OpenURL Framework for Context-Sensitive.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
Linking Courseware to Library Resources Using OpenURL The Missing Link? CNI April 30, 2003 Oren Beit-Arie Linking Courseware to.
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
The Open Archives Initiative Movement Kurt Maly Old Dominion University Norfolk Virginia, USA Brazilian DL.
07/11/2002Thomas Baron - JACoW Workshop1 CERN Library Requirements T. Baron CERN ETT-DH-CDS.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Emerging Uses for the OpenURL Framework Ann Apps and Ross MacIntyre MIMAS, The University of Manchester.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
OAI Overview Michael L. Nelson Old Dominion University Norfolk Virginia, USA Bioinformatics Seminar ODU CS 791/891.
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
The Open Archives Initiative Protocol for Metadata Harvesting: Overview Jewel Ward Visiting Scholar, Keio University Lib-Sys Seminar, Keio University,
Primary funding is provided by the JISC and ESRC. Based at Manchester Computing, The University of Manchester. 1 1 Getting Technical - Linking UKSG Serial.
Kurt Maly Department of Computer Science Old Dominion University Norfolk, Virginia 23529, USA Digital Libraries, OAI and Free Software.
The OAI: overview and historical context OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University --
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
The OpenURL Framework for the Context- sensitive Provision of Service Links Herbert Van de Sompel Cornell University -- Computer Science Emory University.
Automatic Metadata Discovery from Non-cooperative Digital Libraries By Ron Shi, Kurt Maly, Mohammad Zubair IADIS International Conference May 2003.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
May 26-28ICNEE 2003 ARCHON: BUILDING LEARNING ENVIRONMENTS THROUGH EXTENDED DIGITAL LIBRARY SERVICES Hesham Anan, Kurt Maly, Mohammad Zubair,et al. Digital.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
Jenny Walker JOIN-UP 6 th March Enabling the delivery of localized extended services the OpenURL framework Agenda The delivery of localized extended.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel Lecture 15,16 reference.
Open Archives Initiative Gail McMillan Digital Library and Archives, Virginia Tech Society for Scholarly Publishing: June 1, 2000.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
Arc – Federated Searching Service Kurt Maly, Xiaoming Liu, M.Zubair, Michael L.Nelson Old Dominion University January 23, 2001.
DSpace An Open Source Dynamic Digital Repository Xizi (Cecilia) Cai IS565 Spring 2013 DL Topic Presentation.
The Open Archives Initiative and the Sheet Music Consortium Jon Dunn, Jenn Riley IU Digital Library Program October 10, 2003.
The UPS protoproto project herbert van de sompel, michael nelson, thomas krichel UPS 1 Meeting Santa Fe - October 21th 1999.
U.S. Government Use of the OAI-PMH Michael L. Nelson Old Dominion University Norfolk Virginia, USA ISTEC / NSF.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
OAI: XML-Based Digital Library Interoperability Michael L. Nelson NASA Langley Research Center
Networked Information Resources Federated search, link server, e-books.
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
NASA Technical Report Server (NTRS) Project Overview April 2, 2003
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
Open Archive Initiative
Introduction to Digital Libraries Week 13: Reference Linking & OpenURL
Institutional Repositories
IVOA Interoperability Meeting - Boston
Presentation transcript:

Institutional Archives Technology Overview Michael L. Nelson Old Dominion University Institutional Archives & Repositories: What this digital movement means for Federal Libraries Library of Congress Workshop September 12, 2003

Acknowledgements ODU: K. Maly, M. Zubair, J. Bollen LANL: R. Luce, X. Liu NASA: G. Roncaglia, J. Rocker Cornell: C. Lagoze, S. Warner MAGiC (UK): Paul Needham and, of course, Herbert Van de Sompel (LANL) –the OpenURL slides are nicked from his presentations

Outline A bit of history Core technologies –OAI-PMH –OpenURL Example implementations Download and go…

OAI-PMH

Background I met Herbert Van de Sompel in April –we spoke of a demonstration project he had in mind and had received sponsorship from Paul Ginsparg and Rick Luce –We wanted to demonstrate a multi-disciplinary DL that leveraged the large number of high quality, yet often isolated, tech report servers, e-print servers, etc. most digital libraries (DLs) had grown up along single disciplines or institutions –little to no interoperability; isolated DL “gardens”

Universal Preprint Service A cross-archive DL that that provides services on a collection of metadata harvested from multiple archives –Nelson: NCSTRL+; a modified version of Dienst support for “clustering” support for “buckets” –Krichel: ReDIF metadata format –Van de Sompel: SFX Linking Demonstrated at Santa Fe NM, October 21-22, 1999 – –D-Lib Magazine, 6(2) 2000 (2 articles) –UPS was soon renamed the Open Archives Initiative (OAI)

Self-describing archives –Much of the learning about the constituent UPS archives occurred out of band… Data Providers –publishing into an archive –providing methods for metadata “harvesting” provide non-technical context for sharing information also Service Providers –harvest metadata from providers –implement user interface to data Data and Service Providers Even if these are done by the same DL, these are distinct roles

Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata –data remains at remote repositories user... search for “cfd applications” local copy of metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained all searching, browsing, etc. performed on the metadata here individual nodes can still support direct user interaction

Result… OAI The OAI was the result of the demonstration and discussion during the Santa Fe meeting –OAI = a bunch of people, a religion, a cult, etc. –OAI Protocol For Metadata Harvesting (OAI-PMH) = the protocol created and maintained by the OAI Initial focus was on federating collections of scholarly e-print materials… …however, interest grew and the scope and application of OAI-PMH expanded to become a generic bulk metadata transport protocol Note: –OAI-PMH is only about metadata -- not full text! but what is metadata vs. full-text? –OAI is neutral with respect to the nature of the metadata or the resources the metadata describes read: commercial publishers have an interest in OAI-PMH too...

Open Archives Initiative The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management still applies!) Archive defined as a “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. TLA; needed another vowel...

OAI-PMH Mechanics Request is encoded in http Response is encoded in XML XML Schema for the responses are defined in the OAI-PMH document

Overview of OAI-PMH Verbs VerbFunction Identifydescription of archive ListMetadataFormatsmetadata formats supported by archive ListSetssets defined by archive ListIdentifiersOAI unique ids contained in archive ListRecordslisting of N records GetRecordlisting of a single record archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp set-membership is item-level property OAI-PMH Data Model

Data Providers / Service Providers data providers (repositories) service providers (harvesters)

Aggregators data providers (repositories) service providers (harvesters) aggregator aggregators allow for: scalability for OAI-PMH load balancing community building discovery

Aggregators Frequently interchangeable terms: –aggregators: likely to be community / institutionally focused –caches: stores a copy, less likely to be community-oriented –proxies: less likely to store a copy, may gateway between OAI-PMH and other protocols Dienst / OAI Gateway; Harrison, Nelson, Zubair, JCDL 03 To learn more about aggregators, caches & proxies: – –

Example Aggregators Arc - –first described “hierarchical harvesting” in D- Lib Magazine, 7(4) Celestial - –among other services, it provides a history of harvests (successful vs. errors)

OAI-PMH 2.0 Registration Data Providers: Service Providers: 75 repositories registered ??? unregistered repositories unregistered because: testing / development not for public harvesting public, but “low-profile” never got around to it… ??? DP:SP ~= 5:1

Registration is Nice… …But Not Required OAI-PMH is (becoming) the “http” for digital libraries –there is no central registry of http servers remember the NCSA “What’s New” page? (ca. 1994) There will never be “registration support” in OAI-PMH –registries are a type of service provider, built on top of OAI-PMH –registration will be an integral part of community building –friends…

… harvester Identify NASA example

Field of Dreams It should be easy to be a data provider, even if it makes more work for the service provider. –if enough data providers exist, the service providers will come (DPs >> SPs) Open-source / freely available tools –“drop-in” data providers at the end of this presentation –tools to make your existing DL a data provider: also: OAI-implementers mailing list / mail archive! –service providers:

OAI-PMH Meeting History OAI Open Day, Washington DC 1/2001 2nd OAI Workshop CERN 10/2002 Protocol definition, development tools DPs, retrofitting existing DLs SPs, new services Socio-Economic- Political Issues

Shift of Topics From the protocol itself, supporting & debugging tools and how to retrofit (existing) DLs… …to building (new) services that use the OAI-PMH as a core technology and reporting on their impact to the institution/community

Arc harvests all known archives first end-user service provider source available through SourceForge hierarchical harvesting

NCSTRL metadata harvesting replacement for Dienst- based NCSTRL based on Arc computer science metadata

Archon physics metadata based on Arc features: –citation indexing –equation-based searching

Torii physics metadata features –personalization –recommendations –WAP access

iCite physics metadata features –citation based access to arXiv metadata

my.OAI covers all registered metadata features –result sets –personalization –many other advanced features

Cyclades scientific metadata features –personalization –recommendations –collaboration status?

citebase arXiv metadata citation based indexing, reporting

OAIster harvests all known archives

Others… Commercial publishers –American Physical Society (APS) –Institute of Physics –Elsevier / Scirus ( Department of Energy –OSTI –LANL Institutional servers –DSpace (MIT; –Eprints ( –DARE (All Dutch universities)

NACA Technical Report Server publicly available –began in 1996 –details in NASA TM scanned reports from –NACA = predecessor to NASA contents mirrored with the MaGIC project –a UK-based grey-literature preservation project –OAI-PMH used to mirror contents

NACA Report 1345 as seen through its native DL

NACA Report 1345 as seen through MAGiC

NACA Report 1345 as seen through its Scirus (Elsevier)

NACA Report 1345 as seen through my.OAI (FS Consulting)

NASA Technical Report Server replacement for the previous distributed searching version of NTRS –MySQL –Va Tech harvester –modified “bucket” –details in Nelson, Rocker, Harrison, Library Hi-Tech, 21(2) (March 2003) a service provider & aggregator –same OAI baseURL as used for interactive searching

NASA Technical Report Server advanced, fielded search explicit query routing –10 NASA repositories –4 non-NASA repositories turned “off” by default

non-NASA repositories > 0.5M records

NASA DLs in the Larger STI Realm NTRS LTRSATRS CASITRS … DOE DOD UniversitiesPublishers... International NTRS could also be a data provider from the point of view of other DLs; allowing the harvesting of NASA report metadata. NTRS could also harvest metadata from other DLs, and provide access to non-NASA content. We hope to influence the direction of the science.gov effort to use OAI-PMH this could be a fully connected graph

Service Providers It is clear that SPs are proliferating, despite (because of?) the inherent bias toward DPs in the protocol –easy to be a DP -> many DPs -> SPs eventually emerge –hard to be a DP -> SPs starve –currently 5x DPs more than SPs SPs are beginning to offer increasingly sophisticated services –competitive market originally envisioned for SPs is emerging

OpenURL

The Context: Library Automation Environment anno 1998 distributed information environment local & remote A&I databases rapidly growing e-journal collection need to interlink the available information The Problem: links are delivered by info providers links are not sensitive to user’s context appropriate copy problem links dependent on business agreements between information vendors links don’t cover the complete collection Origins & Motivation

The Context: Library Automation Environment anno 1998 distributed information environment local & remote A&I databases rapidly growing e-journal collection need to interlink the available information The REAL Problem: libraries have no say in linking libraries are losing core part of the “organizing information” task expensive collection is not used optimally users are not well served Origins & Motivation

The Solution: In information services: DO NOT provide a link which is an actual service related to a referenced item (e.g. a link from a record in an A&I database to the corresponding full-text) BUT rather provide a link that transports metadata about the referenced item to others that are better placed to provide service links OpenURL Linking server operated by library

link source link destination link to referenced work. resource resolution of metadata into link reference non-OpenURL linking resource link

link source. user-specific resolution of metadata & identifiers into services reference OpenURL linking OpenURL linking server provision of OpenURL link destination link destination link destination link destination transportation of metadata & identifiers context-sensitive

Nature of solution determined Experiment with local databases at Ghent University Demonstrated October 1998 at Belgian Library meeting Problem statement & Experiment described in 2 D-Lib Magazine papers, April 1999 Evolution ~ 1998

Feasibility of solution tested in 2 complex environments Experiments: & LANL, Ghent, APS, Wiley, SilverPlatter, Ex Libris UPS Prototype: arXiv, SLAC/SPIRES, LANL, Ghent, … Demonstrated: June 1999 at ALA LiTA session, New Orleans October 1999 at OAI meeting, Santa Fe Experiments described in 2 D-Lib Magazine papers, October 1999 and February 2000 Evolution ~ 1999

OpenURL 0.1 released Quick adoption of OpenURL 0.1 in information community SFX linking server goes beta Evolution ~ 2000

Integration of OpenURL Framework and DOI/CrossRef framework Experiment involving CNRI, LANL, OhioLink, Academic Press, Ex Libris, … DOI/OpenURL integration described in 2 D-Lib Magazine papers, March 2001 and September 2001 First non-SFX linking servers appear Evolution ~ 2001

Proposal to standardize OpenURL Generalization of OpenURL Framework concepts beyond scholarly information community Described in: Van de Sompel, Herbert and Beit-Arie, Oren. Generalizing the OpenURL Framework beyond References to Scholarly Works: the Bison-Futé model. July/August D-Lib Magazine. NISO AX Committee starts standardization of the OpenURL Framework using the Bison-Futé model as the basis of its work. Evolution ~ 2001

NISO OpenURL Standardization Charge Use existing “OpenURL Framework” as starting point notion of context-sensitive services notion of transporting “contextual” metadata packages to obtain context-sensitive services Define syntax and transport-method for “contextual” metadata packages Ensure extensibility: must support future applications must support other information communities => Generalize and Standardize

NISO OpenURL Standardization Charge Therefore, to be addressed were: OpenURL Framework beyond scholarly resources “contextual” metadata packages Syntax for “contextual” metadata packages Transport of “contextual” metadata packages

metadata plane resource1 resource2resource3 default links herbert van de sompel default links: restricted in nature action-radius restricted by business agreements not context-sensitive

metadata plane extended services plane resource1 service component1 service component2 default links appropriate links OpenURL resource2resource3 herbert van de sompel

Download and Go!

Where Do You Want to Build? user... data provider data provider data provider data provider service provider local context- sensitive services EPrints.org

Fedora joint project between Cornell & UVa –funded by the Mellon Foundation a repository management system –focuses on complex digital objects and their behaivors more info: – –D-Lib Magazine, 9(4)

MIT + HP Labs constructed to capture all the output of MIT’s faculty now generalized to the DSpace Federation –8 top universities in the US & Canada More info: – – –D-Lib Magazine 9(1)

EPrints.org developed at Southampton University –part of larger suite of institutional/author self-archiving tools and services e.g.: citebase; paracite widely adopted sites – more info – –

P2P publishing for academia –community servers for coordination, management –archivelets for individual laptops, PCs more info: – –D-Lib Magazine 7(4)

developed by UKOLN –open source OpenURL 0.1 format resolver –NISO 1.0 format??? more info: –Ariadne, 28 ftp://ftp.ukoln.ac.uk/metadata/tools/openresolver/

Conclusions

Why The OAI-PMH is NOT Important Users don’t care OAI-PMH is middleware –if done right, the uninterested user should never have to know OAI Inside Using OAI-PMH does not insure a good SP OAI-PMH is (or is becoming) HTTP for DLs –few people get excited about http now http & OAI-PMH are core technologies whose presence is now assumed

Other Uses For the OAI-PMH Assumptions: –Traditional DLs / SPs will continue on their present path of increasing sophistication citation indexing, search results viz, personalization, recommendations, subject-based filtering, etc. –growth rates remain the same (5x DPs as SPs) Premise: OAI-PMH is applicable to any scenario that needs to update / synchronize distributed state –Future opportunities are possible by creatively interpreting the OAI-PMH data model See Van de Sompel, Young & Hickey, D-Lib Magazine July 2003,

OpenURL Framework evolution A spec based on HTTP GET to transport metadata about a scholarly referent & the context in which the referent is referenced Draft Van de Sompel, Beit-Arie, Hochstenbach - 05/2001 A framework Standard that enables different Communities to: describe a referent describe the context in which the referent is referenced transport these descriptions NISO Draft Standard - 04/2003

The Future: Community Building Ultimately, protocols and metadata formats are not what makes a difference Rather, the critical mass afforded by a common set of utilities (cf. http, Dublin Core, XML) The best current example: The Open Language Archives Community – OAI-PMH provides the basis for communication between strangers, but allows even richer communication between friends