Presentation is loading. Please wait.

Presentation is loading. Please wait.

OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library.

Similar presentations


Presentation on theme: "OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library."— Presentation transcript:

1 OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers
Timothy W. Cole Mathematics Librarian & Professor of Library Administration University of Illinois at Urbana-Champaign 2005 Allen Press Emerging Trends Seminar National Press Club, Washington, D.C. 13 April 2005

2 OAI Protocol for Metadata Harvesting
‘Harvesting’ approach to interoperability at metadata level Divides world into Metadata Providers & Service Providers Builds on HTTP, XML, & community metadata standards Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

3 Metadata Harvesting Model
End-Users OAI Service Provider Metadata & Content Repositories [Retrieval] Content Metadata (e.g. XML) [Search] OAI Provider OAI Harvester Aggregated Metadata OAI Provider Metadata (e.g. SQL) Content Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

4 Metadata Harvesting Model (cont.)
OAI Service Provider (harvester) is middleman between content provider and end-user for selected metadata-based transactions – e.g., Resource discovery Value-added link mediation Transactions involving full content still conducted directly between end-users and content provider – e.g., Delivery of complete article in desired format OAI-PMH is not synonymous with Open Access Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

5 How OAI-PMH Works OAI “VERBS” Identify ListMetadataFormats ListSets
ListIdentifiers ListRecords GetRecord Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

6 Protocol Details OAI Transaction = OAI request (HTTP) & corresponding OAI response (XML) Transactions initiated by harvester Optional flow control mechanisms to manage provider load OAI Item Identifiers – persistent & unique Item (Metadata) Date Stamps – support selective harvesting OAI supports multiple metadata formats Distinguishes between an ITEM (complete metadata) & a RECORD (disseminated item of metadata in given format) Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

7 Reliance on HTTP & XML OAI-PMH is a REpresentational State Transfer (REST) protocol (unlike RPC, SOAP) OAI requests and responses are sent via the HTTP protocol OAI requests encoded as HTTP GET or POST operations OAI responses are valid XML documents Consistency and data “quality” is ensured by using XML Schema Definitions (XSD) for all responses XML Namespaces used to identify which parts of response are metadata and which parts support the Protocol Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

8 What it takes to implement OAI
Dynamic Web server functionality (e.g., CGI) Capacity to respond with XML Descriptive metadata in a standard format OAI persistent identifiers & date stamps may require changes to metadata creation workflow Open source implementations available (starting points) OAI-PMH included in turnkey publishing solutions: Public Knowledge Project (UBC) Open Repository (BioMed Central), ... Eprints.org, DSpace, Fedora, ARNO, CDSware, ... Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

9 Provider Performance Issues
Database design biggest impact on performance e.g., load to dynamically map to DC, other formats Webserver performance load can be kept quite low Use resumptionTokens, other flow control mechanisms to improve performance Fetch only records needed to satisfy current request resumptionTokens should retain state information for best performance and for idempotency Scale example: OCLC repository with 4+ million records Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

10 OAI Implementation Guidelines for Repositories
Tools Required Basic program strategies (incl. object-oriented approaches) Guidance for use of optional container elements Metadata generation / mapping, data cleaning Use of OAI Sets resumptionToken, flow control, load-balancing Denial-of-service prevention Error handling Strategies for deleted metadata records Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

11 Why OAI? OAI is not synonymous with open access -- content provider maintains access control over full content Implement once, provide metadata to multiple services Less performance impact than robotic Web harvesting Simpler than z39.50 Puts your metadata in additional portals But, less control over How your metadata is presented to end-user What your metadata is put next to by service providers How valuable a commodity is your metadata? Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

12 Who’s Using OAI to Expose Metadata
OAI Data Provider Registry ( As of 1 March 2005: 607 active OAI metadata provider repositories Range in size from millions of items, to less than 100 items More than half are institutional repositories or eprint archives Handful of publisher / publisher-aggregators, e.g.: PubMed Central; BioOne; BioMed Central (partial); Project Euclid; Africa Journals Online; Institute of Physics (user id & password); American Physical Society (restricted access); ... Individual journals, e.g.: J. of STEM Education; Electronic J. of Probability; J. of Cognitive Affective Learning; Canadian J. of Communication; ... Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

13 Who’s Harvesting Metadata Using OAI-PMH
Portals encouraging Open Access, e.g.: OAIster; Public Knowledge Project; Citebase; Cyclades; ... NSDL (STEM Education); NCSTRL (computer science); SAIL (physical science e-prints); ... Local harvesting projects As way to share data internally As a collation service to their users – e.g., Grainger Search Service; OAI harvesting supported by some Library meta-search utilities Web search engines that use OAI as one input stream Yahoo! ingests from OAIster; Google looking to harvest DSpace sites; Scirus includes OAI metadata; ... mod_OAI (Apache Web servers) as an alternative to Web robotic harvesting? Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

14 Indirect Benefits from OAI-PMH
From Bibusages study (French National Library): Digital Libraries are used in conjunction with Web search engines, generalist portals, commercial sites Mix of intensive & casual users DL users seeking answer for specific information need; most time spent discovering, viewing, & downloading documents “Digital Libraries … are now attracting a new type of public, bringing about new, unique and original ways for reading and understanding texts.” Houssem Assadi, et al. “Users & Uses of Online Digital Libraries in France,” ECDL 2003 Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

15 Evolution of Scholarly Communication
Ubiquitous nature of electronic pre-prints & post-prints Extensive linking to supporting content on the Web Mixing of author-paid publication with traditional subscription based business models (e.g., AIP, Springer trials) Citation frequency up for articles also available in arXiv: Demographic and citation trends in Astrophysical Journal papers and preprints / Greg J. Schwarz and Robert C. Kennicutt, Jr. BAAS 36: , 2004 [also: Some publishers encouraging self-archiving of pre-prints IMS; APS; AIP; ... [see OAI-PMH underpins these kinds of self-archiving services Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005

16 A Librarian’s Perspective
The information landscape can be seen as a contour map in which there are mountains, hillocks, valleys, plains and plateaus…. A specialized collection of particular importance is like a sharp peak. Upon a plateau there might be undulations representing strengths and weaknesses…. The landscape is, however, multidimensional. Where one scholar may see a peak another may see a trough. The task is to devise mapping conventions which enable scholars to read the map of the landscape fruitfully, at the appropriate level of generality or specificity. Michael Heaney (2000), “An Analytical Model of Collections and their Catalogues.” Timothy W. Cole University of Illinois at UC OAI-PMH & Its Usefulness to STM Publishers 2005 Allen Press Seminar, 13 April 2005


Download ppt "OAI Protocol for Metadata Harvesting & Its Usefulness to STM Publishers Timothy W. Cole (t-cole3@uiuc.edu) Mathematics Librarian & Professor of Library."

Similar presentations


Ads by Google