Presentation is loading. Please wait.

Presentation is loading. Please wait.

OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.

Similar presentations


Presentation on theme: "OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring."— Presentation transcript:

1 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring of Digital Library Content Jeroen Bekaert, and Herbert Van de Sompel Digital Library Research & Prototyping Team Research Library, Los Alamos National Laboratory This work supported in part by the Library of Congress

2 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland context Add APS collection to locally hosted LANL collection o Remain permanently synced o Ensure correctness of locally stored APS data Bigger picture: o Archive APS content o Create efficient content transfer/mirroring approach between information providers & LANL o NDIIP: Create efficient content transfer/mirroring approach between heterogeneous content repositories. -Efficient mechanisms are largely non-existent. -Devise a standards-based approach: – MPEG-21 DIDL – OAI-PMH – W3C XML Signatures

3 Bigger picture: OAIS perspective

4 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository APS Digital Object represented as application-neutral MPEG-21 DIDL document & exposed through OAI-PMH front-end Each datastream provided via a DIDL document is accorded a digest. Digests delivered in DIDL document via W3C XML Signatures A complete DIDL document is accorded a digest; delivered in the OAI- PMH « about » container via W3C XML Signature

5 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository Remain synced via OAI-PMH datestamp-based harvesting of DIDL documents: o New APS Digital Objects o Updated APS Digital Objects

6 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process Datastreams delivered By-Value and/or By-Reference o By-Reference requires dereferencing of datastream post harvest Storage in pre-ingest area: o Harvested DIDL documents in XMLtape o Dereferenced content in ARC files APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository

7 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process Verification of digests: o DIDL document o Datastreams Digest correct: continue Digest incorrect: reharvest APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository

8 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process Ingest Digital Objects: o Map application-neutral DIDL documents to aDORe-profile DIDL documents o Insert digests per constituent datastream (W3C XML Signatures) o Store in aDORe XMLtape/ARCfile environment APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository

9 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland APS / LANL mirroring process Recurrent introspection in both repositories Ability to harvest in both directions in case of problems with stored Digital Objects APS repository OAI-PMH repository LANL pre-ingest & ingest OAI-PMH harvester OAI-PMH request OAI-PMH response aDORe repository

10 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland software OAIResource: generic Java-based OAI-PMH resource harvesting software package: o Goal: gather resources by OAI-PMH harvesting first o Can deal with OAI-PMH repositories irrespective of their supported metadata formats o Plug-in structure makes the process of dereferencing datastreams configurable per OAI-PMH repository o Results of harvesting/gathering stored as follows: -OAI-PMH records concatenated into XMLtapes -Datastreams concatenated into Internet Archive ARC files o Log files: -List successful and unsuccesful harvesting/gathering -List relationship between OAI-PMH records in XMLtapes and datastreams in ARC files

11 OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland Papers Jeroen Bekaert and Herbert Van de Sompel. A Standards-based Solution for the Accurate Transfer of Digital Assets. D-Lib Magazine, June 2005. http://dx.doi.org/10.1045/june2005-bekaertA Standards-based Solution for the Accurate Transfer of Digital Assetshttp://dx.doi.org/10.1045/june2005-bekaert Jeroen Bekaert, Herbert Van de Sompel. Access Interfaces for Open Archival Information Systems based on the OAI-PMH and the OpenURL Framework for Context-Sensitive Services. 2005. Preprint at http://arxiv.org/abs/cs.DL/0509090. Draft of an accepted submission for PV 2005 "Ensuring Long-term Preservation and Adding Value to Scientific and Technical data". http://arxiv.org/abs/cs.DL/0509090 Herbert Van de Sompel, Jeroen Bekaert, Xiaoming Liu, Lyudmila Balakireva, Thorsten Schwander. aDORe: a modular, standards-based Digital Object Repository. 2005. The Computer Journal. Preprint at arXiv:cs.DL/0502028. Computer Journal paper at doi:10.1093/comjnl/bxh114 arXiv:cs.DL/0502028 doi:10.1093/comjnl/bxh114


Download ppt "OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring."

Similar presentations


Ads by Google