OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library.

Slides:



Advertisements
Similar presentations
A centre of expertise in digital information management The OAI Protocol for Metadata Harvesting Andy Powell UKOLN,
Advertisements

Usage Statistics in Context: related standards and tools Oliver Pesch Chief Strategist, E-Resources EBSCO Information Services Usage Statistics and Publishers:
Search, access and impact: Web citation services Tim Brody Intelligence, Agents, Multimedia Group University of Southampton.
28 April 2004Second Nordic Conference on Scholarly Communication 1 Citation Analysis for the Free, Online Literature Tim Brody Intelligence, Agents, Multimedia.
OAI in DigiTool DigiTool Version 3.0.
NSF – DLF – JISC/UKOLN Digital Library Service Registry Workshop National Science Foundation, Arlington, VA March 2006 The University of Illinois.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
OAI Standards for Sheet Music Meeting March 28-29, 2002 Basic OAI Principals How They Apply to Sheet Music Presenter: Curtis Fornadley, Senior Programmer/Analyst.
California Digital Library eScholarship Repository Int’l Conference on Digital Institutional Repositories 9-10 December 2004, Hong Kong Catherine H.Candee.
Basic Concepts Architecture Topology Protocols Basic Concepts Open e-Print Archive Open Archive -- generalization of e-print Data Provider and Service.
OCLC Online Computer Library Center OCLC’s Digital Archive – Disseminating with METS Jay Goodkin Software Engineer Digital Collection and Preservation.
Digital Library Architecture and Technology
Thomas G. Habing – University of Illinois at Urbana-Champaign Recap: SIGIR 2001 OAI Workshop 19 September OAI Provider Workshop, University of.
Introduction to the OAI Metadata Harvesting Protocol Hussein Suleman, Digital Library Research Laboratory Virginia Tech.
IMLS NLG Collection Registry & Item-Level Metadata Repository at the University of Illinois Timothy W. Cole Mathematics Librarian &
Serenate1 Non-standard users: The Library Raf Dekeyser K.U.Leuven.
University of Illinois at Urbana-Champaign OAI Alpha Experiences Timothy W. Cole Thomas G. Habing Grainger Engineering.
Electronic Theses at Rhodes University presented by Irene Vermaak Rhodes University Library National ETD Project CHELSA Stakeholder Workshop 5 November.
BMC Open Access Colloquium, 8 February Morgan: "Open Access Repositories"
The DPubS Development Project: Building an Open Source Electronic Publishing System David Ruddy Cornell University Library.
OAI-PMH: Open Archives Initiative Protocol for Metadata Harvesting T.B. Rajashekar National Centre for Science Information (NCSI) Indian Institute of Science,
CBSOR,Indian Statistical Institute 30th March 07, ISI,Kokata 1 Digital Repository support for Consortium Dr. Devika P. Madalli Documentation Research &
Digital Library Interoperability Architecture CS 502 – Carl Lagoze – Cornell University.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
Caltech CODA CODA: Collection of Digital Archives Caltech Scholarly Communication.
Slavic Digital Text Workshop 2006 The Open Archives Initiative Protocol for Metadata Harvesting: an Opportunity for Sharing Content in a Distributed Environment.
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
Integrating Access to Digital Content Sarah Shreeves University of Illinois at Urbana-Champaign Visual Resources Association 23 rd Annual Conference Miami.
Agenda Why discuss Digital Libraries What is a digital Library History Meta-data FEDORA NSDL D Space.
Search Interoperability, OAI, and Metadata Sarah Shreeves University of Illinois at Urbana-Champaign Basics and Beyond Grainger Engineering Library April.
The OAI: technical overview OAI Open Meeting – Washington DC – January 23 rd 2001 Herbert Van de Sompel & Carl Lagoze Cornell University -- Computer Science.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
Open Archives Initiative Protocol for Metadata Harvesting.
Experiences Implementing OAI Provider Services 13 September ACM SIGIR, New Orleans Open Archives: Communities, Interoperability and Services Timothy.
Serenate1 The librarian’s view Raf Dekeyser K.U.Leuven.
Distributed Service Registry Workshop, Warwick, U.K. 1 Distributed Functionality in the UIUC OAI Registry
California Digital Library eScholarship: a UC Publishing Initiative Catherine H.Candee Director, Publishing and Strategic Initiatives Office of Scholarly.
2/22/2016J Ammerman1 Open Archives Initiative What is it? What’s it good for?
NSDL & the Open Archives Initiative A Brief Introduction to OAI Timothy W. Cole Mathematics Librarian & Professor of Library Administration.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Smart Linking With SFX SFX Training, Intranet Internet range of authorities, technologies A&I e-print FTXT OPAC FTXT A&I Electronic Scholarly Information.
The Open Archives Initiative: Perspectives on Metadata Harvesting OAI Provider & Harvesting Services at the University of Illinois Timothy W. Cole Mathematics.
Introduction to SHERPA RoMEO and its Significance for Publishers
Utility of an OAI Service Provider Search Portal
Opening access to quality research materials
University of Illinois at Urbana-Champaign OAI Alpha Experiences
Metasearch: Top-Level Interface, Reference Applications
Digital Library Issues and Trends
WHAT DOES THE FUTURE HOLD? Ann Ellis Dec. 18, 2000
The Hosted Model Charl Roberts Good morning again,
Georges Arnaout Chaitanya Krishna
VI-SEEM Data Repository
Introduction to Implementing an Institutional Repository
OAI and Metadata Harvesting
Digitometric Services for Open Archives Environments
OAI 11/20/07.
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Open Archive Initiative
Digital Library Issues and Trends
JISC Information Environment Service Registry (IESR)
Institutional Repositories
This presentation will probably involve audience discussion, which will create action items. Use PowerPoint to keep track of these action items during.
IVOA Interoperability Meeting - Boston
OAI & NSDL Research at Grainger Briefing to UIUC Library Faculty 15 April 2003 Timothy W. Cole William H. Mischo
Managing the Institutional Repository for OA Khawulile Radebe: Librarian: Repository Administrator & Metadata.
RCSI institutional repository rcsi
Presentation transcript:

OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library Administration University of Illinois at Urbana-Champaign 2005 Allen Press Emerging Trends Seminar National Press Club, Washington, D.C. 13 April 2005

OAI Protocol for Metadata Harvesting ‘Harvesting’ approach to interoperability at metadata level Divides world into Metadata Providers & Service Providers Builds on HTTP, XML, & community metadata standards http://www.openarchives.org/ Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Metadata Harvesting Model End-Users OAI Service Provider Metadata & Content Repositories [Retrieval] Content Metadata (e.g. XML) [Search] OAI Provider OAI Harvester Aggregated Metadata OAI Provider Metadata (e.g. SQL) Content Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Metadata Harvesting Model (cont.) OAI Service Provider (harvester) is middleman between content provider and end-user for selected metadata-based transactions – e.g., Resource discovery Value-added link mediation Transactions involving full content still conducted directly between end-users and content provider – e.g., Delivery of complete article in desired format OAI-PMH is not synonymous with Open Access Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

How OAI-PMH Works OAI “VERBS” Identify ListMetadataFormats ListSets ListIdentifiers ListRecords GetRecord Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Protocol Details OAI Transaction = OAI request (HTTP) & corresponding OAI response (XML) Transactions initiated by harvester Optional flow control mechanisms to manage provider load OAI Item Identifiers – persistent & unique Item (Metadata) Date Stamps – support selective harvesting OAI supports multiple metadata formats Distinguishes between an ITEM (complete metadata) & a RECORD (disseminated item of metadata in given format) Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Reliance on HTTP & XML OAI-PMH is a REpresentational State Transfer (REST) protocol (unlike RPC, SOAP) OAI requests and responses are sent via the HTTP protocol OAI requests encoded as HTTP GET or POST operations OAI responses are valid XML documents Consistency and data “quality” is ensured by using XML Schema Definitions (XSD) for all responses XML Namespaces used to identify which parts of response are metadata and which parts support the Protocol Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

What it takes to implement OAI Dynamic Web server functionality (e.g., CGI) Capacity to respond with XML Descriptive metadata in a standard format OAI persistent identifiers & date stamps may require changes to metadata creation workflow Open source implementations available (starting points) OAI-PMH included in turnkey publishing solutions: Public Knowledge Project (UBC) Open Repository (BioMed Central), ... Eprints.org, DSpace, Fedora, ARNO, CDSware, ... Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Provider Performance Issues Database design biggest impact on performance e.g., load to dynamically map to DC, other formats Webserver performance load can be kept quite low Use resumptionTokens, other flow control mechanisms to improve performance Fetch only records needed to satisfy current request resumptionTokens should retain state information for best performance and for idempotency Scale example: OCLC repository with 4+ million records Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

OAI Implementation Guidelines for Repositories Tools Required Basic program strategies (incl. object-oriented approaches) Guidance for use of optional container elements Metadata generation / mapping, data cleaning Use of OAI Sets resumptionToken, flow control, load-balancing Denial-of-service prevention Error handling Strategies for deleted metadata records http://www.openarchives.org/OAI/2.0/guidelines-repository.htm Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Why OAI? OAI is not synonymous with open access -- content provider maintains access control over full content Implement once, provide metadata to multiple services Less performance impact than robotic Web harvesting Simpler than z39.50 Puts your metadata in additional portals But, less control over How your metadata is presented to end-user What your metadata is put next to by service providers How valuable a commodity is your metadata? Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Who’s Using OAI to Expose Metadata OAI Data Provider Registry (http://oai.grainger.uiuc.edu/registry) As of 1 March 2005: 607 active OAI metadata provider repositories Range in size from millions of items, to less than 100 items More than half are institutional repositories or eprint archives Handful of publisher / publisher-aggregators, e.g.: PubMed Central; BioOne; BioMed Central (partial); Project Euclid; Africa Journals Online; Institute of Physics (user id & password); American Physical Society (restricted access); ... Individual journals, e.g.: J. of STEM Education; Electronic J. of Probability; J. of Cognitive Affective Learning; Canadian J. of Communication; ... Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Who’s Harvesting Metadata Using OAI-PMH Portals encouraging Open Access, e.g.: OAIster; Public Knowledge Project; Citebase; Cyclades; ... NSDL (STEM Education); NCSTRL (computer science); SAIL (physical science e-prints); ... Local harvesting projects As way to share data internally As a collation service to their users – e.g., Grainger Search Service; OAI harvesting supported by some Library meta-search utilities Web search engines that use OAI as one input stream Yahoo! ingests from OAIster; Google looking to harvest DSpace sites; Scirus includes OAI metadata; ... mod_OAI (Apache Web servers) as an alternative to Web robotic harvesting? Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Indirect Benefits from OAI-PMH From Bibusages study (French National Library): Digital Libraries are used in conjunction with Web search engines, generalist portals, commercial sites Mix of intensive & casual users DL users seeking answer for specific information need; most time spent discovering, viewing, & downloading documents “Digital Libraries … are now attracting a new type of public, bringing about new, unique and original ways for reading and understanding texts.” Houssem Assadi, et al. “Users & Uses of Online Digital Libraries in France,” ECDL 2003 Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

Evolution of Scholarly Communication Ubiquitous nature of electronic pre-prints & post-prints Extensive linking to supporting content on the Web Mixing of author-paid publication with traditional subscription based business models (e.g., AIP, Springer trials) Citation frequency up for articles also available in arXiv: Demographic and citation trends in Astrophysical Journal papers and preprints / Greg J. Schwarz and Robert C. Kennicutt, Jr. BAAS 36:1654-1663, 2004 [also: http://arxiv.org/abs/astro-ph/0411275] Some publishers encouraging self-archiving of pre-prints IMS; APS; AIP; ... [see http://www.sherpa.ac.uk/romeo.php] OAI-PMH underpins these kinds of self-archiving services Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

A Librarian’s Perspective The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaus…. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses…. The landscape is, however, multidimensional. Where one scholar may see a peak another may see a trough. The task is to devise mapping conventions which enable scholars to read the map of the landscape fruitfully, at the appropriate level of generality or specificity. Michael Heaney (2000), “An Analytical Model of Collections and their Catalogues.” Timothy W. Cole (t-cole3@uiuc.edu) University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005