Presentation is loading. Please wait.

Presentation is loading. Please wait.

LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s.

Similar presentations


Presentation on theme: "LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s."— Presentation transcript:

1 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s Repository Architecture : An Overview Digital Library Research & Prototyping Team Los Alamos National Laboratory Research Library

2 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Context Uniform approach for storing and disseminating LANL data collections Interesting characteristics of the repository architecture: o Distributed by design o Use of MPEG-21 DIDL to represent complex objects ~ DIDs o Multi-faceted use of OAI-PMH to access the repository o Use of NISO OpenURL to access the repository o Dynamic binding of behaviors to DIDs o Use of XMLTape for storing collections of DIDs o Use of Internet Archive ARC files for storing bitstreams

3 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Presentation Overview Walk-through the LANL Repository Architecture: o Ingest process o MPEG-21 DIDL o OAI-PMH repositories o Repository Index o Identifier Resolver o OAI-PMH Federator o OpenURL Gateway Discussion of potential impact of the Repository effort beyond LANL: o Transfer of complex objects via the OAI-PMH: recurrent transfer of data feeds, mirroring/syncing of archives, … o Federation of Institutional Repositories: Beyond Dublin Core o mod_oai: OAI-PMH and web crawling

4 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher 123 4 5 6 APPLICATION 7

5 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Pre-Ingest: Data input from vendor Data feeds from third parties: o Are delivered in various ways (http, ftp,..) o Have many different formats o Upon delivery, are stored in pre-ingestion area o Typically contain many items in a single feed 1

6 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Ingest: Creation of DIDs & XMLtapes Pre-ingestion area is monitored for deliveries New deliveries are processed for ingestion: o An MPEG-21 DIDL object – a DID - is created per delivered item. o All DIDs of the delivery are concatenated into a single XML file: the XMLtape 2

7 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 1. Data Model Abstract Definitions + W3C XML Schema Entities o a Container didl:Container o an Item didl:Item o a Component didl:Component o a Resource didl:Resource o a Descriptor didl:Descriptor o … Remarks o Defined LANL DIDL profile, remaining fully DIDL compliant o We create concrete DIDL profiles ‘per collection’ 2

8 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 1. Data Model 2

9 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors Secondary information pertaining to Entities o MPEG-21 defined uses -identification information – MPEG-21 Part 3 : DII -rights information – MPEG-21 Part 5 : REL / Part 4 : IPMP -processing information – MPEG-21 Part 10 : DIP o community/application specific uses -cf. use of Descriptors in LANL profile 2

10 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - Identifiers urn:isbn:0-395-36341-1 … MPEG-21 dii:Identifier 2

11 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - rights … Copyright2003; American Physical Society … MPEG-21 r:license 2

12 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL - 2. Descriptors - behaviors … urn:foobar:Argument … MPEG-21 dip:ObjectType Content … urn:foobar:Argument function PlayTrack() { } … MPEG-21 dip:Argument Processing Item 2

13 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY MPEG-21 DIDL LANL Profile 2 questions: o How to map datastreams of complex objects of the LANL repository to the DIDL data model o How to use Descriptors to meet the design goals of the repository and its associated applications LANL DID profile, explained by means of the following example: o A complex object consisting of -LANL technical report –1 file: pdf –id = info:lanl-repo/tr/LA-9870 -metadata record –2 versions: raw MARC record and derived MARCXML file –id = info:lanl-repo/opac/LANLb10012271 2

14 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL Profile representation LANL technical reportMARC record 2

15 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Other characteristics of LANL DIDL Profile Bitstream handling: o Inline XML data (such as MARCXML, …) o Pointers to bitstreams stored in ARC files o DIDs become uniform proxies to the heteregeneous ‘real stuff’ LANL DIDL Profile & collection Profiles enforced using Schematron Digests – DID-level & bitstream-level – included in DIDs (W3C XML Signature) Handling of identifiers: o DID identifiers ~ XML structure o Content identifiers ~ actual content Creation dates o XML documents and constituent XML elements o datastreams Collections ‘Format’ information (in addition to DIDL mimeType) 2

16 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifiers in LANL DIDL Profile 2 Types of identifiers DID-identifier ~ identifier(s) associated with XML document/structure o DIDL root level: info:lanl-repo/i/UUID1 o Container-level: info:lanl-repo/i/UUID1#UUIDx o Item-level: info:lanl-repo/i/UUID1#UUIDy o Component-level: info:lanl-repo/i/UUID1#UUIDz Content-identifier ~ identifier associated with content o Item-level: info:lanl-repo/tr/LA-9870 @DIDid info URI, XML IDs MPEG-21 DII Descriptor 2

17 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY dii info:lanl-repo/tr/LA-9870dii info:lanl-repo/opac/LANLb10012271 LANL technical reportMARC record @DIDid info:lanl-repo/i/UUID1 #UUIDx #UUIDy #UUIDz#UUIDb#UUIDa Identifiers in LANL DIDL Profile 2

18 2

19 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Creation dates in LANL DIDL Profile @DIDcreated 2004-04-28T15:42:16Zdcterms.created 2002-01-29T13:04:21Z 2

20 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Collections in LANL DIDL Profile dcterms.isPartOf Info:sid/library.lanl.gov:TR dcterms.isPartOf Info:sid/library.lanl.gov:OPAC 2

21 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Indication of bibliographic data in LANL DIDL Profile dc.type http://purl.org/dc/terms/bibliographicCitation 2

22 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record Indication of rights in LANL DIDL Profile dc.rights Textual statement 2

23 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL technical reportMARC record dc.format info:lanl-repo/fmt/1 dc.format info:lanl-repo/pro/metadata dc.format info:lanl-repo/pro/ai content-stream:text:structured-text:mark-up-lang:xml#application/marc+xml ‘Formats’ in LANL DIDL Profile 2

24 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY ‘Formats’ as placeholder for dynamic behaviors … urn:foobar:Argument function PlayTrack() { } … MPEG-21 dip:Argument Processing Item … urn:foobar:Argument … MPEG-21 dip:ObjectType Content Item Profile/ Behavior Registry … … info:lanl-repo/fmt/1 … … stored DID disseminated DID dynamic insertion of behaviors 2

25 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY XMLTape: XML wrapper for DIDs Zipped Index (@DIDid and @DIDcreated) based on byte offset and byte count in zipped file DID content: inline metadata inline XML secondary information pointers to content DID resources in ARC files XMLTape: sequential storage of DIDs DID-identifier Datestamp of creation XMLTape DID DID-identifier Datestamp of creation DID-identifier Datestamp of creation … 2

26 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY ARC: sequential storage of DID Resources XMLTape DID ARC resource ARC Index arc id 1ARC pointer 1 arc id 2ARC pointer 2 arc id 3ARC pointer 3 resource 2

27 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher 123 4 5 6 VERTITY 7

28 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL A&I publisher publisher Making DIDs accessible through the OAI-PMH baseURL(1) Expose techReport baseURL(2) A&I baseURL(3) FTXT Ingest techReport A&I FTXT OAI-PMH identifier = @DIDid OAI-PMH datestamp = @DIDcreated OAI-PMH response = DIDs OAI-PMH sets Collection = dcterms.isPartOf Profile ~ Digital Format Identifier= dc.format 3 example

29 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher 123 4 5 6 VERTITY 7

30 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Repo Index Repository Index baseURL(1)2003-02-20 baseURL(2)2003-01-15 baseURL(3)2002-11-12 STEP 2: ListRecords (OAI-PMH) List of DIDs Repository Index: keeping track of OAI-PMH repositories baseURL(index) baseURL(1) Expose baseURL(2) A&I STEP 1: ListIdentifiers (OAI-PMH) baseURL(1) techReport 4 Example Example 1 ExampleExample 2

31 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Registration of OAI-PMH repository in Repository Index is done during Implementation: o Generic MySQL-based OAI-PMH repository with OCLC’s OAICat as front-end Repository Index 4 2

32 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher 123 4 5 6 VERTITY 7

33 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifier Resolver monitors DID-id or content-id baseURL & DID-id identifier resolver identifierdatestamprepository DID-id 12003-02-20 baseURL(1) & DID-id 1 Content-id 12003-01-15 baseURL(2) & DID-id x Content-id 22002-11-12 baseURL(x) & DID-id y Repo Index baseURL(index) Expose baseURL(2) A&I techReport Identifier Resolver: locating DIDs and DID Items/Resources DID-id Content-id ark id 5

34 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Identifier Resolver with history for Content Identifiers IdentifierRepository Location baseURLprotocolRepository Id Extension (XML ID) info:lanl-repo/i/UUID1baseURL1OAI-PMHinfo:lanl-repo/i/UUID1 info:lanl-repo/opac/LANLb10012271baseURL1OAI-PMHinfo:lanl-repo/i/UUID1UUID2 info:lanl-repo/tr/LA-9870baseURL1OAI-PMHinfo:lanl-repo/i/UUID1UUID3 5

35 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Is loaded: o Through OAI-PMH harvesting for ‘regular’ updating. o Through batch mechanism for bulk loading of new collections Identifier Resolver 5 Example Example 1 – select identifiers Example Example 2 – resolve identifiers

36 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Overview of the LANL architecture LANL OpenURL Ingest Repo Index publisher OAI-PMH OpenURL OAI PMH Identifier Resolver OAI PMH CNRI handle, JAVA, C MPEG-21 DIP Engine Registry of trans- formations DID Profile/ Behavior Registry DID with DIM OAI PMH FTXT A&I TechReport Pre-Ingest publisher Indata.lanl.gov A&I publisher 123 4 5 6 VERTITY 7

37 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY OAI-PMH DID-level access Expose A&I techReport OAI-PMH Federator: single point of access to DIDs DID DID, METS, SCORM, … MPEG-21 DIP Engine Registry of trans- formations Profile/ Behavior Registry DID with PI FTXT OAI-PMH Federator set = baseURL(1) set = baseURL(2) set = baseURL(3) OAI-PMH sets baseURL Collection Format 6

38 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY DIM Inserter: dynamic insertion of behaviors

39 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Exposes complete LANL repository as a single OAI-PMH repository. OAI-PMH Federator provides: o Single point of access o Facility to transform strored DIDs (e.g. identifiers only) Downstream applications define harvesting projects to collect data. E.g.: o Verity o Identifier Resolver o Netrics Harvesting projects specify values for OAI-PMH parameters Implementation: based on OCLC’s OAIHarvester I OAI-PMH Federator 6 Example Example 1

40 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Expose A&I techReport OpenURL OAI-PMH OpenURL access to Items across repositories OpenURL Requester … ServiceType Referent OpenURL Item-level and DID-level access FTXT Profile/ Behavior Registry DID with PI transformed content MPEG-21 DIP Engine Registry of trans- formations 7

41 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Disseminate DIDs, DID items and transforms thereof Example: the OpenURL front-end,the OpenURL front-end http://gws.lanl.gov:9080/openurl-servlet/testhttp://gws.lanl.gov:9080/openurl-servlet/test? & rfr_id=info:sid/library.lanl.gov & url_ver=Z39.88-2004 & rft_id=info:lanl-repo/biosis/PREV196905076682 & svc_id=info:lanl-repo/svc/tomods.marc OpenURL-based disseminations 7 Example Example 1 – extract MARCXML Example Example 2 – extract BIOSIS XML

42 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Summary of repository access methods DID-levelItem-level OAI-PMH – individual repositories DIDL--- OAI-PMH – Federator DIDL, METS, SCORM, IMS, ToC, … --- OpenURL Gateway DIDL, METS, SCORM, ToC, … Transforms of content OAIS: Dissemination Information Package(s) OAIS: Result Set

43 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Discussion of potential impact Use of OAI-PMH & complex objects opens new realm of possibilities OAI-PMH to recurrently transfer digital objects (represented as complex objects) between environments. Transferred package (DIP at source ; SIP at target) is independent of repository infrastructure at both ends. Transferred package can contain digests that allow parties involved to recurrently check for bit-level issues. Data feeds from publishers (cf. LANL/APS/LoC NDIIP project) o From IR to trusted archives: archiving (cf DARE, KB) o Between trusted archives: mirroring (cf. LANL/APS/LoC NDIIP project) o Services based on content (cf. DARE, FAIR, DINI)

44 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 1 : LoC – APS – LANL project Funded by Library of Congress NDIIP OAI-PMH harvesting of APS content for ingestion in LANL & LoC repositories Maps APS content to MPEG-21 DIDL structure Ongoing work for inclusion of digest/signatures in DIDLs Example Example 1

45 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 2 : Old Dominion University & LANL mod_oai project Funded by Andrew W. Mellon Foundation Implement OAI-PMH plug-in for – Apache - Web servers Will allow selective & incremental OAI-PMH harvesting of content hosted by Web servers o datestamp o sets ~ MIME type o initially static Web content o OAI-PMH identifiers == URLs Two operating modes for crawlers: o General crawler: ListIdentifiers => URLs of Web content o Advanced crawler: ListRecords ~ Dublin Core and one or more complex object formats OAI-PMH as a tool to make Web harvesting more efficient Example Example 1

46 LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY Example 3 : LANL DSpace plug-in prototype Introduced at recent DSpace Federation meeting Maps DSpace data model [ item – bundle – component] to MPEG-21 DIDL data model [ Container – Item – Resource] Exposes MPEG-21 DIDL documents through built-in DSpace OAI-PMH infrastructure Metadata (Dublin Core) and Content (MPEG-21 DIDL) harvestable via the OAI-PMH Example Example 1


Download ppt "LWW January 27, 2004, Los Alamos, NM LANL Ingestion and Repository architecture Research Library, Los Alamos National Laboratory RESEARCH LIBRARY LANL’s."

Similar presentations


Ads by Google