Presentation is loading. Please wait.

Presentation is loading. Please wait.

Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003.

Similar presentations


Presentation on theme: "Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003."— Presentation transcript:

1 Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003

2 Where we are now De facto standard for Internet information exchange Deployed extensively and internationally –(digital) libraries –Museums –Eprint repositories –Research projects

3 Protocol Stability OAI-PMH has been stable since release –No functional changes, just typographic edits –Validation of leadership/participation model No plans for a 3.0 release –Core protocol will not be extended –Minor 2.x release could occur (more later) –Additional implementation guidelines (more later)

4 NSDL and OAI-PMH

5 The NSDL Context National STEM (Science, Technology, Engineering, Mathematics, Medicine) Digital Library Major National Science Foundation project targeted at the application of web and Internet to (STEM) education $25M over six years to over 100 projects –Collections –Services –Targeted Research –Core Integration

6 Aggregation rather than collection –Core integration team will not manage any collections Spectrum of interoperability –Accommodate diversity of participation models –Open interfaces and standards permitting plug in of array of value-added services One library many portals –Accommodate multiple quality and selection metrics –Tailor presentation of content and nature of services to audience needs NSDL technical guidelines

7 LevelAgreementsExample FederationStrict use of standardsAACR, MARC (syntax, semantic, Z 39.50 and business) HarvestingDigital libraries exposeOpen Archives metadata; simplemetadata harvesting protocol and registry GatheringDigital libraries do not Web crawlers cooperate; services mustand search engines seek out information Spectrum of interoperability

8 This is a big task that no one has done before! Work on the priorities –Focus on one point on spectrum of interoperability Metadata harvesting Incorporate NSF funded collections and selected other collections –Leverage existing (or at least emerging) technologies and protocols OAI, uPortal, Shibboleth, SDLIP, InQuery –Provide reliable base level services Search and Discovery, Access Management, User Profiles, Exemplary Portals, Persistence Plant some seeds for the future –Machine-assisted metadata generation –Automated collection aggregation –Web gathering strategies Translating to initial goals

9 Central storage of all metadata about all resources in the NSDL –Defines the extent of NSDL collection –Metadata includes collections, items, annotations, etc. MR main functions –Aggregation –Normalization –redistribution Ingest of metadata by various means –Harvesting, manual, automatic, cross-walking Open access to MR contents for service builders via OAI-PMH Metadata Repository

10 Metadata Strategy Collect and redistribute any native (XML) metadata format Provide crosswalks to Dublin Core from standard formats –DC-GEM, LTSC (IMS), ADL (SCORM), MARC, FGCD, EAD Concentrate on collection-level metadata Use automatic generation to augment item-level metadata

11 Importing metadata into the MR Collections Harvest Staging area Cleanup and crosswalks Database load Metadata Repository

12 Exporting metadata from the MR

13 NSDL and OAI-PMH Two years later Concepts are good, practice is hard Issues –Metadata is hard http://www.well.com/~doctorow/metacrap.htm –XML is hard –Protocols are hard Static repositories (more later) –IP is relevant (more later)

14 Some Essential Metadata Questions Review original (DC) metadata assumptions –Metadata is essential for good resource discovery –Joe Sixpack could create metadata Account for current realities –2003 is not 1994 –Google, etc. keeps getting better

15 Metadata Space

16 Metadata Triage

17 Reconsidering the Dublin Core Requirement Questions about utility of unqualified DC –The conundrum…. Specification too loose to serve intended interoperability goal But more complex metadata may be too hard Limited energy for interoperability –Data providers implement required DC at expense of better metadata Use of protocol for purposes other than resource discovery

18 Rethinking record-oriented model Implications for record-oriented harvesting????

19 Topology Evolution Simple Data Provider, Service Provider Topology

20 Topology Evolution (cont.) Metadata Aggregator

21 Topology Evolution (cont.) OAI-PMH p2p network

22 OAI-P 2p MH Issues Document (metadata) location –Exploit unique identifiers, use efficient key-based location mechanisms (distributed hash tables) Provenance-based queries –Metadata records may go through refinement and/or translation phases as they move through value-added aggregators. –Exploit provenance guidelines Network harvesting –Broadcast query (Gnutella) inefficient –Exploit techniques for efficient routing of queries (P- trees)

23 OAI-PMH and Intellectual Property Protocol exists in a context where information providers have concerns about use of intellectual property OAI-PMH is nominally about metadata, but… –Rich metadata is an intellectual product –The protocol can be used to transmit anything (e.g. content) that can be encoded in XML –Generally metadata leads to content so….

24 OAI-rights effort Goal is to investigate and develop means of expressing rights about metadata and resources in the OAI framework. The result will be an addition to the OAI implementation guidelines that specifies mechanisms for rights expressions within OAI-PMH. –No changes to core protocol

25 OAI-rights Effort (cont.) Extensible, providing a general framework for expressing rights statements within OAI-PMH. –Not an effort to develop a new rights expression language Use Creative Commons licenses as a motivating and deployable example. Release of specification by 2 nd quarter 04 Invited OAI-rights group –Standard OAI development model

26 Dimensions of OAI-PMH and rights Entity Association Metadata: concern in NSDL for (re)use of rich metadata Content: predominant application of the protocol to resource discovery and ultimate access makes this important

27 Dimensions of OAI-PMH and rights Aggregation Association OAI-PMH aggregations –Repository –Set –Item Rights association with an aggregation may provide shortcut (e.g., the rights for all resources in a repository/set…) Cost of shortcut is pseudo-statefulness, possibly complex overriding rules

28 Dimensions of OAI-PMH and rights Binding Choices –exploit mechanisms in metadata formats e.g., DC- rights –restrict the rights statements to some more specific protocol mechanism –allow some mixture of these methods. DC-rights problems –Semantics is restricted to rights about resource –Cant embed XML in dc value –What if DC is not required Burden on harvesters if rights embedding is not explicit but scattered across several locations

29 OAI-PMH Static Repositories Provide a lightweight mechanism for data provider participation Intended for relatively small and static collections Two components –Static Repository XML format Semantically equivalent to Identify and ListRecords Invisible to harvester –Static Repository Gateway Virtual data provider for static repository data Unique baseURL for each contained static repository

30 Static Repositories and Static Repository Gateway

31 Static Repositories Open Issue Relationship to RSS?????

32 Conclusions Interoperability and lowest common denominator Rapid advances automated methods –Moores law –Smart algorithms –Benefits of issues of scale Combining human effort and automated methods –Extracting order from chaos –Learning from order Move beyond resource discovery

33 Typical Values repository –collection of publications resource –scholarly publication item –all metadata (DC + MARC) record –a single metadata format datestamp –last update / addition of a record metadata format –bibliographic metadata format set –originating institution or subject categories

34 Repositories… Stretching the idea of a repository a bit: –contextually sensitive repositories personalization for harvesters communication between strangers, or communication between friends? –OAI-PMH for individual complex objects? OAI-PMH without MySQL?! –Fedora, Multi-valent documents, buckets –tar, jar, zip, etc. files

35 Resource What if resource were: –computer system status uptime, who, w, df, ps, etc. –or generalized system status e.g., sports league standings –people personnel databases authority files for authors

36 Item What if item were: –software union of versions + formats –all forms of metadata administrative + structural citations, annotations, reviews, etc. –data e.g., newsfeeds and other XML expressible content –metadataPrefixes or sets could be defined to be different versions

37 Record What if record were: –specific software instantiations / updates –access / retrieval logs for DLs (or computer systems) –push / pull model inversion put a harvester on the client behind a firewall, the client contacts a DP and receives instructions on how to submit the desired document (e.g., send email to a specified address)

38 Datestamp semantics of datestamp are strongly influenced by the choice of resource / item / record / metadataPrefix, but it could be used to: –signify change of set membership (e.g., workflow: item moves from submitted to approved) –change datestamp to reflect access to the DP e.g., in conjunction with metadataPrefixes of accessed or mirrored

39 metadataPrefix what if metadataPrefix were: –instructions for extracting / archiving / scraping the resource verb=ListRecords&metadataPrefix=extract_TIFFs –code fragments to run locally (harvested from a trusted source!) –XSLT for other metadataPrefixes branding container is at the repository-level, this could be record- or item-level

40 Set sets are already used for tunneling OAI- PMH extensions (see Suleman & Fox, D-Lib 7(12)) other uses: –in aggregators, automatically create 1 set per baseURL –have hidden sets (or metadataPrefix) that have administrative or community-specific values (or triggers) set=accessed>1000&from=2001-01-01 set=harvestMeWithTheseARGS&until=2002-05- 05&metadataPrefix=oai_marc


Download ppt "Open Archives Initiative Where we are, Where we are going Carl Lagoze 4 th OAF Workshop September, 2003."

Similar presentations


Ads by Google