LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s.

Slides:



Advertisements
Similar presentations
Contextual Linking Architecture Christophe Blanchi June Corporation for National Research Initiatives Approved for.
Advertisements

RESEARCH LIBRARY Content Packaging for Complex Objects MPEG – 21 1 February 2007 Frances Knudson Repository Team Los Alamos National Laboratory Research.
UKOLN is supported by: JISC Information Environment update Repositories and Preservation Programme meeting, October 24-25, 2006 Rachel Heery UKOLN
October 28, 2003Copyright MIT, 2003 METS repositories: DSpace MacKenzie Smith Associate Director for Technology MIT Libraries.
METS: An Introduction Structuring Digital Content.
Copying Archives Project Group Members: Mushashu Lumpa Ngoni Munyaradzi.
CNI Fall Task Force Meeting 2003, Portland, OR Using MPEG-21 DIDL, the OAI-PMH, and the OpenURL as building blocks for storing & disseminating complex.
Y.T. a brief history of the OAI 0 Kaynak: Herbert van de Sompel.
Depositing e-material to The National Library of Sweden.
Object Re-Use and Exchange Mellon Retreat, Nassau Inn, Princeton, NJ, March Herbert Van de Sompel, Carl Lagoze The OAI Object Re-Use & Exchange.
DSpace Devika P. Madalli DRTC, ISI Bangalore.
OAI-PMH Dawn Petherick, University Web Services Team Manager, Information Services, University of Birmingham MIDESS Dissemination.
1 Archiving Workflow between a Local Repository and the National Library Archive Experiences from the DiVA Project Eva Müller, Peter Hansson, Uwe Klosa,
The Fedora Project April 28-29, 2003 CNI, Washington DC Thornton Staples University of Virginia Sandy Payette Cornell Information Science.
SCORM-NSDL Workshop May 18, Educational Materials are Scattered across the Internet NASA Math Forum State standards Scientific American Ask.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation
Some thoughts on OpenURL version 1.0 Herbert Van de Sompel Los Alamos National Laboratory – Research Library NISO AX meeting, Getty Museum, May
Dspace – Digital Repository Dawn Petherick, University Web Services Team Manager Information Services, University of Birmingham MIDESS Dissemination.
The Open Archives Initiative Simeon Warner (Cornell University) Symposium on “Scholarly Publishing and Archiving on the Web”, University.
The Open Archives Initiative Simeon Warner (Cornell University) Open Archives seminar “Facilitating Free and Efficient Scientific.
PAWN: A Novel Ingestion Workflow Technology for Digital Preservation Mike Smorul, Joseph JaJa, Yang Wang, and Fritz McCall.
OAI-PMH at Yale Report on the DLF OAI Training Session November 10, 2005 Charlottesville, VA.
Addressing Metadata in the MPEG-21 and PDF-A ISO Standards NISO Workshop: Metadata on the Cutting Edge May 2004 William G. LeFurgy U.S. Library of Congress.
XML: The Strategic Opportunity Roy Tennant Challenges*  Only librarians like to search, everyone else likes to find  Our users want more information.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
Rapid Visual OAI Tool S. Kothamasa, K. Maly, M. Zubair (Old Dominion University) X. Liu (Los Alamos National Laboratory) RCDL 2003, St. Petersburg.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAIResource Software Her This work supported in part by the.
How to build your own Dark Archive (in your spare time) Priscilla Caplan FCLA.
The Metadata Object Description Schema (MODS) NISO Metadata Workshop May 20, 2004 Rebecca Guenther Network Development and MARC Standards Office Library.
Herbert van de sompel Workshop on OAI and peer review journals in Europe Geneva, Switserland – March 22nd to 24th 2001 Herbert Van de Sompel Cornell University.
ECDL 2005, September 18 th - 23 th 2005, Vienna, Austria File-based storage of Digital Objects: XMLtapes & Internet Archive ARC files Xiaoming Liu, Luda.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland A New Model for Web Resource Harvesting Her This work supported.
File format registries - a global infrastructure for local persistence Andreas Aschenbrenner, ERPANET.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland OAI-PMH for Resource Harvesting Herbert Van de Sompel Digital.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Use & Access 26 March Use “Proof of Concept” Model for General Libraries & IS faculty Model for General Libraries & IS faculty Test bed for DSpace.
Van de Sompel, Herbert Los Alamos National Laboratory – Research Library OAI-PMH for Resource Harvesting.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH) Phil Barker, March © Heriot-Watt University. You may reproduce all or any part.
Open Archive Initiative – Protocol for metadata Harvesting (OAI-PMH) Surinder Kumar Technical Director NIC, New Delhi
1 GRID Based Federated Digital Library K. Maly, M. Zubair, V. Chilukamarri, and P. Kothari Department of Computer Science Old Dominion University February,
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
The Mint Mapping tool The MoRe aggregator Vassilis Tzouvaras, Dimitris Gavrilis National Technical University of Athens Digital Curation Unit - IMIS, Athena.
The Open Archives Initiative Marshall Breeding Director for Innovative Technologies and Research Vanderbilt University
DSpace System Architecture 11 July 2002 DSpace System Architecture.
GPO’s Future Digital System (FDsys) November 2, 2006 LS&CM CENDI Presentation.
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
OAI-PMH for Resource Harvesting Tutorial OAI4, October 20 th 2005, CERN, Geneva, Switzerland The American Physical Society Project: Standards-based Mirroring.
UKOLN is supported by: Content packaging and MPEG-21 DID Andy Powell, UKOLN, University of Bath JISC Joint Programmes Meeting, July.
Lifecycle Metadata for Digital Objects November 15, 2004 Preservation Metadata.
Herbert Van de Sompel Research Library, Los Alamos National Laboratory OAI4, October , CERN, Geneva, Switzerland RESEARCH LIBRARY Lessons in.
Institutional Repositories July 2007 DIGITAL CURATION creating, managing and preserving digital objects Dr D Peters DISA Digital Innovation South.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Mod_oai: Metadata Harvesting for Everyone Michael L. Nelson, Herbert Van de Sompel, Xiaoming Liu, Aravind Elango
The Multi-Faceted Use of the OAI-PMH in the LANL Repository Written By: Henry, Xiaoming,Patrick Henry, Xiaoming,Patrick and Herbert. Presented By: Shashi.
A Modular, Standards-based Digital Object Repository
Building A Repository for Digital Objects
The Fedora Project March 19, 2003 ISTEC Symposium, Brazil
Flexible Extensible Digital Object Repository Architecture
Flexible Extensible Digital Object Repository Architecture
An Architecture for Complex Objects and their Relationships
VI-SEEM Data Repository
OAI and Metadata Harvesting
A New Model for Web Resource Harvesting
Digital Preservation Seminar
Open Archive Initiative
Introduction to Digital Libraries Week 13: Reference Linking & OpenURL
The Fedora Project April 28-29, 2003 CNI, Washington DC
Presentation transcript:

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s Repository Architecture : An Overview Digital Library Research & Prototyping Team Los Alamos National Laboratory Research Library

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Context Uniform approach for storing and disseminating LANL data collections Interesting characteristics of the repository architecture: o Distributed by design o Use of MPEG-21 DIDL to represent complex objects ~ DIDs o Multi-faceted use of OAI-PMH to access the repository o Use of NISO OpenURL to access the repository o Dynamic binding of behaviors to DIDs o Use of XMLTape for storing collections of DIDs o Use of Internet Archive ARC files for storing bitstreams

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Presentation Overview Walk-through the LANL Repository Architecture: o Ingest process o MPEG-21 DIDL o OAI-PMH repositories o Repository Index o Identifier Resolver o OAI-PMH Federator o OpenURL Gateway Discussion of potential impact of the Repository effort beyond LANL: o Transfer of complex objects via the OAI-PMH: recurrent transfer of data feeds, mirroring/syncing of archives, … o Federation of Institutional Repositories: Beyond Dublin Core o mod_oai: OAI-PMH and web crawling

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher APPLICATION 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Pre-Ingest: Data input from vendor Data feeds from third parties: o Are delivered in various ways (http, ftp,..) o Have many different formats o Upon delivery, are stored in pre-ingestion area o Typically contain many items in a single feed 1

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Ingest: Creation of DIDs & XMLtapes Pre-ingestion area is monitored for deliveries New deliveries are processed for ingestion: o An MPEG-21 DIDL object – a DID - is created per delivered item. o All DIDs of the delivery are concatenated into a single XML file: the XMLtape 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 1. Data Model Abstract Definitions + W3C XML Schema Entities o a Container didl:Container o an Item didl:Item o a Component didl:Component o a Resource didl:Resource o a Descriptor didl:Descriptor o … Remarks o Defined LANL DIDL profile, remaining fully DIDL compliant o We create concrete DIDL profiles ‘per collection’ 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 1. Data Model 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors Secondary information pertaining to Entities o MPEG-21 defined uses -identification information – MPEG-21 Part 3 : DII -rights information – MPEG-21 Part 5 : REL / Part 4 : IPMP -processing information – MPEG-21 Part 10 : DIP o community/application specific uses -cf. use of Descriptors in LANL profile 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - Identifiers urn:isbn: … MPEG-21 dii:Identifier 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - rights … Copyright2003; American Physical Society … MPEG-21 r:license 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - behaviors … urn:foobar:Argument … MPEG-21 dip:ObjectType Content … urn:foobar:Argument function PlayTrack() { } … MPEG-21 dip:Argument Processing Item 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL LANL Profile 2 questions: o How to map datastreams of complex objects of the LANL repository to the DIDL data model o How to use Descriptors to meet the design goals of the repository and its associated applications LANL DID profile, explained by means of the following example: o A complex object consisting of -LANL technical report –1 file: pdf –id = info:lanl-repo/tr/LA metadata record –2 versions: raw MARC record and derived MARCXML file –id = info:lanl-repo/opac/LANLb

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL Profile representation LANL technical reportMARC record 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Other characteristics of LANL DIDL Profile Bitstream handling: o Inline XML data (such as MARCXML, …) o Pointers to bitstreams stored in ARC files o DIDs become uniform proxies to the heteregeneous ‘real stuff’ LANL DIDL Profile & collection Profiles enforced using Schematron Digests – DID-level & bitstream-level – included in DIDs (W3C XML Signature) Handling of identifiers: o DID identifiers ~ XML structure o Content identifiers ~ actual content Creation dates o XML documents and constituent XML elements o datastreams Collections ‘Format’ information (in addition to DIDL mimeType) 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifiers in LANL DIDL Profile 2 Types of identifiers DID-identifier ~ identifier(s) associated with XML document/structure o DIDL root level: info:lanl-repo/i/UUID1 o Container-level: info:lanl-repo/i/UUID1#UUIDx o Item-level: info:lanl-repo/i/UUID1#UUIDy o Component-level: info:lanl-repo/i/UUID1#UUIDz Content-identifier ~ identifier associated with content o Item-level: info URI, XML IDs MPEG-21 DII Descriptor 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY dii info:lanl-repo/tr/LA-9870dii info:lanl-repo/opac/LANLb LANL technical reportMARC info:lanl-repo/i/UUID1 #UUIDx #UUIDy #UUIDz#UUIDb#UUIDa Identifiers in LANL DIDL Profile 2

2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Creation dates in LANL DIDL T15:42:16Zdcterms.created T13:04:21Z 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Collections in LANL DIDL Profile dcterms.isPartOf Info:sid/library.lanl.gov:TR dcterms.isPartOf Info:sid/library.lanl.gov:OPAC 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Indication of bibliographic data in LANL DIDL Profile dc.type 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Indication of rights in LANL DIDL Profile dc.rights Textual statement 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record dc.format info:lanl-repo/fmt/1 dc.format info:lanl-repo/pro/metadata dc.format info:lanl-repo/pro/ai content-stream:text:structured-text:mark-up-lang:xml#application/marc+xml ‘Formats’ in LANL DIDL Profile 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY ‘Formats’ as placeholder for dynamic behaviors … urn:foobar:Argument function PlayTrack() { } … MPEG-21 dip:Argument Processing Item … urn:foobar:Argument … MPEG-21 dip:ObjectType Content Item Profile/ Behavior Registry … … info:lanl-repo/fmt/1 … … stored DID disseminated DID dynamic insertion of behaviors 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY XMLTape: XML wrapper for DIDs Zipped Index based on byte offset and byte count in zipped file DID content: inline metadata inline XML secondary information pointers to content DID resources in ARC files XMLTape: sequential storage of DIDs DID-identifier Datestamp of creation XMLTape DID DID-identifier Datestamp of creation DID-identifier Datestamp of creation … 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY ARC: sequential storage of DID Resources XMLTape DID ARC resource ARC Index arc id 1ARC pointer 1 arc id 2ARC pointer 2 arc id 3ARC pointer 3 resource 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher VERTITY 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL A&I publisher publisher Making DIDs accessible through the OAI-PMH baseURL(1) Expose techReport baseURL(2) A&I baseURL(3) FTXT Ingest techReport A&I FTXT OAI-PMH identifier OAI-PMH datestamp OAI-PMH response = DIDs OAI-PMH sets Collection = dcterms.isPartOf Profile ~ Digital Format Identifier= dc.format 3 example

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher VERTITY 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Repo Index Repository Index baseURL(1) baseURL(2) baseURL(3) STEP 2: ListRecords (OAI-PMH) List of DIDs Repository Index: keeping track of OAI-PMH repositories baseURL(index) baseURL(1) Expose baseURL(2) A&I STEP 1: ListIdentifiers (OAI-PMH) baseURL(1) techReport 4 Example Example 1 ExampleExample 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Registration of OAI-PMH repository in Repository Index is done during Implementation: o Generic MySQL-based OAI-PMH repository with OCLC’s OAICat as front-end Repository Index 4 2

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher VERTITY 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifier Resolver monitors DID-id or content-id baseURL & DID-id identifier resolver identifierdatestamprepository DID-id baseURL(1) & DID-id 1 Content-id baseURL(2) & DID-id x Content-id baseURL(x) & DID-id y Repo Index baseURL(index) Expose baseURL(2) A&I techReport Identifier Resolver: locating DIDs and DID Items/Resources DID-id Content-id ark id 5

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifier Resolver with history for Content Identifiers IdentifierRepository Location baseURLprotocolRepository Id Extension (XML ID) info:lanl-repo/i/UUID1baseURL1OAI-PMHinfo:lanl-repo/i/UUID1 info:lanl-repo/opac/LANLb baseURL1OAI-PMHinfo:lanl-repo/i/UUID1UUID2 info:lanl-repo/tr/LA-9870baseURL1OAI-PMHinfo:lanl-repo/i/UUID1UUID3 5

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Is loaded: o Through OAI-PMH harvesting for ‘regular’ updating. o Through batch mechanism for bulk loading of new collections Identifier Resolver 5 Example Example 1 – select identifiers Example Example 2 – resolve identifiers

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher VERTITY 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY OAI-PMH DID-level access Expose A&I techReport OAI-PMH Federator: single point of access to DIDs DID DID, METS, SCORM, … MPEG-21 DIP Engine Registry of trans- formations Profile/ Behavior Registry DID with PI FTXT OAI-PMH Federator set = baseURL(1) set = baseURL(2) set = baseURL(3) OAI-PMH sets baseURL Collection Format 6

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY DIM Inserter: dynamic insertion of behaviors

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Exposes complete LANL repository as a single OAI-PMH repository. OAI-PMH Federator provides: o Single point of access o Facility to transform strored DIDs (e.g. identifiers only) Downstream applications define harvesting projects to collect data. E.g.: o Verity o Identifier Resolver o Netrics Harvesting projects specify values for OAI-PMH parameters Implementation: based on OCLC’s OAIHarvester I OAI-PMH Federator 6 Example Example 1

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Expose A&I techReport OpenURL OAI-PMH OpenURL access to Items across repositories OpenURL Requester … ServiceType Referent OpenURL Item-level and DID-level access FTXT Profile/ Behavior Registry DID with PI transformed content MPEG-21 DIP Engine Registry of trans- formations 7

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Disseminate DIDs, DID items and transforms thereof Example: the OpenURL front-end,the OpenURL front-end & rfr_id=info:sid/library.lanl.gov & url_ver=Z & rft_id=info:lanl-repo/biosis/PREV & svc_id=info:lanl-repo/svc/tomods.marc OpenURL-based disseminations 7 Example Example 1 – extract MARCXML Example Example 2 – extract BIOSIS XML

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Summary of repository access methods DID-levelItem-level OAI-PMH – individual repositories DIDL--- OAI-PMH – Federator DIDL, METS, SCORM, IMS, ToC, … --- OpenURL Gateway DIDL, METS, SCORM, ToC, … Transforms of content OAIS: Dissemination Information Package(s) OAIS: Result Set

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Discussion of potential impact Use of OAI-PMH & complex objects opens new realm of possibilities OAI-PMH to recurrently transfer digital objects (represented as complex objects) between environments. Transferred package (DIP at source ; SIP at target) is independent of repository infrastructure at both ends. Transferred package can contain digests that allow parties involved to recurrently check for bit-level issues. Data feeds from publishers (cf. LANL/APS/LoC NDIIP project) o From IR to trusted archives: archiving (cf DARE, KB) o Between trusted archives: mirroring (cf. LANL/APS/LoC NDIIP project) o Services based on content (cf. DARE, FAIR, DINI)

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 1 : LoC – APS – LANL project Funded by Library of Congress NDIIP OAI-PMH harvesting of APS content for ingestion in LANL & LoC repositories Maps APS content to MPEG-21 DIDL structure Ongoing work for inclusion of digest/signatures in DIDLs Example Example 1

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 2 : Old Dominion University & LANL mod_oai project Funded by Andrew W. Mellon Foundation Implement OAI-PMH plug-in for – Apache - Web servers Will allow selective & incremental OAI-PMH harvesting of content hosted by Web servers o datestamp o sets ~ MIME type o initially static Web content o OAI-PMH identifiers == URLs Two operating modes for crawlers: o General crawler: ListIdentifiers => URLs of Web content o Advanced crawler: ListRecords ~ Dublin Core and one or more complex object formats OAI-PMH as a tool to make Web harvesting more efficient Example Example 1

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 3 : LANL DSpace plug-in prototype Introduced at recent DSpace Federation meeting Maps DSpace data model [ item – bundle – component] to MPEG-21 DIDL data model [ Container – Item – Resource] Exposes MPEG-21 DIDL documents through built-in DSpace OAI-PMH infrastructure Metadata (Dublin Core) and Content (MPEG-21 DIDL) harvestable via the OAI-PMH Example Example 1