Presentation is loading. Please wait.

Presentation is loading. Please wait.

New Developments in OAI Michael L. Nelson Old Dominion University OA-Forum May 13-14, 2002 Pisa, Italy Many.

Similar presentations


Presentation on theme: "New Developments in OAI Michael L. Nelson Old Dominion University OA-Forum May 13-14, 2002 Pisa, Italy Many."— Presentation transcript:

1 New Developments in OAI Michael L. Nelson Old Dominion University http://www.cs.odu.edu/~mln/ mln@cs.odu.edu OA-Forum May 13-14, 2002 Pisa, Italy Many slides borrowed from Herbert Van de Sompel & Carl Lagoze

2 N.B. OAI-PMH 2.0 is not scheduled for public beta release until May 19, 2002 –some of the details of this presentation are still subject to change! –final public release of 2.0 scheduled for June 1

3 What’s New in 2.0?! Good news: OAI-PMH is still Six Verbs + DC Incremental improvements –single XML schema –ambiguities removed –more expressive options –cleaner separation of roles & responsibilities Bad news: not backwards compatible with 1.1

4 Open Archives Initiative The protocol is openly documented, and metadata is “exposed” to at least some peer group (note: rights management can still apply!) Archive defined as a “collection of stuff” -- not the archivist’s definition of “archive”. “Repository” used in most OAI documents. OAI is happening at break-neck speed...

5 The Rise and Fall of Distributed Searching wholesale distributed searching, popular at the time, is attractive in theory but troublesome in practice –Davis & Lagoze, JASIS 51(3), pp. 273-80 –Powell & French, Proc 5 th ACM DL, pp. 264-265 distributed searching of N nodes still viable, but only for small values of N NCSTRL: N > 100; bad NTRS/NIX: N<=20; ok (but could be better)

6 The Rise and Fall of Distributed Searching Other problems of distributed searching (from STARTS) –source-metadata problem how do you know which nodes to search? –query-language problem syntax varies and drifts over time between the various nodes –rank-merging problem how do you meaningfully merge multiple result sets? Temptations: –centralize all functions “everything will be done at X” –standardize on a single product “everyone will use system Y”

7 Metadata Harvesting Move away from distributed searching Extract metadata from various sources Build services on local copies of metadata –data remains at remote repositories user... search for “cfd applications” local copy of metadata harvested offline metadata harvested offline metadata harvested offline metadata harvested offline each node independently maintained all searching, browsing, etc. performed on the metadata here individual nodes can still support direct user interaction

8 abouteprints document like objects resourcesmetadata OAMS unqualified Dublin Core unqualified Dublin Core transport HTTP responsesXML requests HTTP GET/POST verbs Dienst OAI-PMH natureexperimental stable model metadata harvesting metadata harvesting metadata harvesting Santa Fe convention OAI-PMH v.1.0/1.1 OAI-PMH v.2.0

9 Santa Fe Convention [02/2000] goal: optimize discovery of e-prints input: the UPS prototype RePEc /SODA “data provider / service provider model” Dienst protocol deliberations at Santa Fe meeting [10/99]

10 OAI-PMH v.1.0 [01/2001] goal: optimize discovery of document-like objects input: SFC DLF meetings on metadata harvesting deliberations at Cornell meeting [09/00] alpha test group of OAI-PMH v.1.0

11 low-barrier interoperability specification metadata harvesting model: data provider / service provider focus on document-like objects autonomous protocol HTTP based XML responses unqualified Dublin Core experimental: 12-18 months OAI-PMH v.1.0 [01/2001]

12 pre- 2.0 OAI Timeline Highlights October 21-22, 1999 - initial UPS meeting February 15, 2000 - Santa Fe Convention published in D-Lib Magazine –precursor to the OAI metadata harvesting protocol June 3, 2000 - workshop at ACM DL 2000 (Texas) August 25, 2000 - OAI steering committee formed, DLF/CNI support September 7-8, 2000 - technical meeting at Cornell University –defined the core of the current OAI metadata harvesting protocol September 21, 2000 - workshop at ECDL 2000 (Portugal) November 1, 2000 - Alpha test group announced (~15 organizations) January 23, 2001 - OAI protocol 1.0 announced, OAI Open Day in the U.S. (Washington DC) –purpose: freeze protocol for 12-16 months, generate critical mass February 26, 2001 - OAI Open Day in Europe (Berlin) July 3, 2001 - OAI protocol 1.1 announced –to reflect changes in the W3C’s XML latest schema recommendation September 8, 2001 - workshop at ECDL 2001 (Darmstadt)

13 OAI-PMH v.2.0 [06/2002] goal: recurrent exchange of metadata about resources between systems input: OAI-PMH v.1.0 feedback on OAI-implementers deliberations by OAI-tech [09/01 -] alpha test group of OAI-PMH v.2.0 [03/02 -]

14 low-barrier interoperability specification metadata harvesting model: data provider / service provider metadata about resources autonomous protocol HTTP based XML responses unqualified Dublin Core stable OAI-PMH v.2.0 [06/2002]

15 p rocess leading to OAI-PMH v.2.0  pre-alpha phase  alpha-phase  creation of OAI-tech  beta-phase

16 created for 1 year period charge: review functionality and nature of OAI-PMH v.1.0 investigate extensions release stable version of OAI-PMH by 05/02 determine need for infrastructure to support broad adoption of the protocol communication: listserv, SourceForge, conference calls creation of OAI-tech [06/01]

17 US representatives Thomas Krichel (Long Island U) - Jeff Young (OCLC) - Tim Cole - (U of Illinois at Urbana Champaign) - Hussein Suleman (Virginia Tech) - Simeon Warner (Cornell U) - Michael Nelson (NASA) - Caroline Arms (LoC) - Mohammad Zubair (Old Dominion U) - Steven Bird (U Penn.) European representatives Andy Powell (Bath U. & UKOLN) - Mogens Sandfaer (DTV) - Thomas Baron (CERN) - Les Carr (U of Southampton) OAI-tech

18 review process by OAI-tech: identification of issues conference call to filter/combine issues white paper per issue on-line discussion per white paper proposal for resolution of issue by OAI-exec discussion of proposal & closure of issue conference call to resolve open issues pre-alpha phase [09/01 – 02/02]

19 creation of revised protocol document in-person meeting Lagoze - Van de Sompel - Nelson – Warner autonomous decisions internal vetting of protocol document pre-alpha phase [02/02]

20 alpha-1 release to OAI-tech March 1st 2002 OAI-tech extended with alpha testers discussions/implementations by OAI-tech ongoing revision of protocol document alpha phase [02/02 – 05/02]

21 The British Library Cornell U. -- NSDL project & e-print arXiv Ex Libris FS Consulting Inc -- harvester for my.OAI Humboldt-Universität zu Berlin InQuirion Pty Ltd, RMIT University Library of Congress NASA OCLC OAI-PMH 2.0 alpha testers (1/2)

22 OAI-PMH 2.0 alpha testers (2/2) Old Dominion U. -- ARC, DP9 U. of Illinois at Urbana-Champaign U. Of Southampton -- OAIA, CiteBase, eprints.org UCLA, John Hopkins U., Indiana U., NYU -- sheet music collection UKOLN, U. of Bath -- RDN Virginia Tech -- repository explorer

23 beta phase [05/02] beta release on May 1st 2002 to : registered data providers and service providers interested parties fine tuning of protocol document preparation for the release of 2.0 conformant tools by alpha testers

24 What’s new in OAI-PMH v.2.0?  corrections  new functionality  general changes to improve solidity of protocol  quick recap

25 Overview of OAI Verbs VerbFunction Identifydescription of archive ListMetadataFormatsmetadata formats supported by archive ListSetssets defined by archive ListIdentifiersOAI unique ids contained in archive ListRecordslisting of N records GetRecordlisting of a single record archival metadata harvesting verbs most verbs take arguments: dates, sets, ids, metadata formats and resumption token (for flow control)

26 Identify Arguments –none Errors –none Arguments –none Errors –badArgument 1.12.0

27 ListMetadataFormats Arguments –identifier (OPTIONAL) Errors –id does not exist Arguments –identifier (OPTIONAL) Errors –badArgument –noMetadataFormats –idDoesNotExist 1.12.0

28 ListSets Arguments –resumptionToken (EXCLUSIVE) Errors –no set hierarchy Arguments –resumptionToken (EXCLUSIVE) Errors –badArgument –badResumptionToken –noSetHierarchy 1.12.0

29 ListIdentifiers Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) Errors –no records match Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFormat –badGranularity –badResumptionToken –noSetHierarchy –noRecordsMatch 1.12.0

30 ListRecords Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –no records match –metadata format cannot be disseminated Arguments –from (OPTIONAL) –until (OPTIONAL) –set (OPTIONAL) –resumptionToken (EXCLUSIVE) –metadataPrefix (REQUIRED) Errors –noRecordsMatch –cannotDisseminateFormat –badGranularity –badResumptionToken –noSetHierarchy –badArgument 1.12.0

31 GetRecord Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –id does not exist –metadata format cannot be disseminated Arguments –identifier (REQUIRED) –metadataPrefix (REQUIRED) Errors –badArgument –cannotDisseminateFor mat –idDoesNotExist 1.12.0

32 general changes clear distinction between protocol and periphery fixed protocol document extensible implementation guidelines: e.g. sample metadata formats, description containers, about containers allows for OAI guidelines and community guidelines

33 general changes clear separation of OAI-PMH and HTTP OAI-PMH error handling all OK at HTTP level? => 200 OK something wrong at OAI-PMH level? => OAI-PMH error (e.g. badVerb)

34 OAI Data Model: Resources / Items / Records resource all available metadata about David item Dublin Core metadata MARC metadata SPECTRUM metadata records item = identifier record = identifier + metadata format + datestamp

35 general changes better definitions of harvester, repository, item, unique identifier, record, set, selective harvesting oai_dc schema builds on DCMI XML Schema for unqualified Dublin Core usage of must, must not etc. as in RFC2119 wording on response compression

36 general changes all protocol responses can be validated with a single XML Schema easier for data providers no redundancy in type definitions SOAP-ready clean for error handling

37 2002-0208T08:55:46Z http://arXiv.org/oai2 oai:arXiv:cs/0112017 2001-12-14 cs math ….. response no errors

38 2002-0208T08:55:46Z http://arXiv.org/oai2 ShowMe is not a valid OAI-PMH verb response with error

39 corrections all dates/times are UTC, encoded in ISO8601, Z-notation 1957-03-20T20:30:00.00Z

40 idempotency of resumptionToken : return same incomplete list when rT is reissued while no changes occur in the repo: strict while changes occur in the repo: all items with unchanged datestamp new attributes for the resumptionToken: expirationDate completeListSize cursor resumptionToken

41 harvesting granularity mandatory support of YYYY-MM-DD optional support of YYYY-MM-DDThh:mm:ssZ granularity of from and until must be the same new functionality

42 Identify more expressive new functionality Library of Congress 1 http://memory.loc.gov/cgi-bin/oai 2.0 dwoo@loc.gov caar@loc.gov transient 1990-02-01T00:00:00Z YYYY-MM-DDThh:mm:ssZ deflate

43 header contains set membership of item new functionality oai:arXiv:cs/0112017 2001-12-14 cs math …..

44 ListIdentifiers returns headers new functionality 2002-0208T08:55:46Z http://arXiv.org/oai2 oai:arXiv:hep-th/9801001 1999-02-23 physic:hep oai:arXiv:hep-th/9801002 1999-03-20 physic:hep physic:exp ……

45 ListIdentifiers mandates metadataPrefix as argument new functionality http://www.perseus.tufts.edu/cgi-bin/pdataprov? verb=ListIdentifiers &metadataPrefix=olac &from=2001-01-01 &until=2001-01-01 &set=Perseus:collection:PersInfo

46 character set for metadataPrefix and setSpec extended to URL-safe characters new functionality A-Z a-z 0-9 _ ! ‘ $ ( ) + -. * identifierType = anyURI repositoryName = string

47 introduction of provenance container to facilitate tracing of harvesting history in the periphery http://an.oa.org oai:r1:plog/9801001 2001-08-13T13:00:02Z oai_dc 2001-08-15T12:01:30Z … … …

48 introduction of friends container to facilitate discovery of repositories in the periphery http://cav2001.library.caltech.edu/perl/oai http://formations2.ulst.ac.uk/perl/oai http://cogprints.soton.ac.uk/perl/oai http://wave.ldc.upenn.edu/OLAC/dp/aps.php4

49 revision of oai-identifier guidelines for collection-level and set-level metadata in the periphery

50 future  adoption  communities  OAI-PMH

51 release of OAI-PMH v.2.0 [06/2002] no backwards compatibility with v.1.0/1.1 stable migration process for registered repos ? formal standardization ? ? SOAP version ~ web services framework [SOAP, WSDL, UDDI] ? the OAI-PMH

52 proliferation of community-specific add-ons for: collection & set level metadata expressive metadata formats (e.g. qualified DC XML Schema) shared set-structures machine readable rights (about the metadata) communities

53 evolution from talking about OAI-PMH to talking about projects that use OAI-PMH to talking about projects and failing to mention they use OAI-PMH => OAI-PMH becomes part of the infrastructure adoption

54 indicators of adoption of OAI-PMH  tools  structural support  service providers  data providers

55 49 registered repositories [11/2001] 65 registered repositories [03/2002] 77 registered repositories [05/2002] 5+ million records many unregistered repositories data providers

56 Arc : cross-searching of registered repositories [Old Dominion U] [ http://arc.cs.odu.edu ]http://arc.cs.odu.edu OLAC: cross-searching of Language Archive Community repositories http://www.language-archives.org/index.html service providers

57 Scirus scientific search engine [Elsevier] [ http://www.scirus.com ]http://www.scirus.com my.OAI : user-tailorable cross-searching of registered repositories [FS Consulting, Inc.] [http://www.myoai.com]http://www.myoai.com growing interest from web search engines service providers

58 Repository Explorer: interactive exploration of repositories [Virginia Tech] [ http://www.purl.org/NET/oai_explorer ]http://www.purl.org/NET/oai_explorer eprints.org: generic OAI-PMH compliant repository software [U of Southampton] [ http://www.eprints.org ]http://www.eprints.org ALCME repository and harvester software [OCLC] [ http://alcme.oclc.org/index.html ]http://alcme.oclc.org/index.html OAI-PMH tools

59 Kepler [Old Dominion U] your personal OAI data provider: Kepler archivelet the Kepler service provider harvests from archivelets that register archivelet downloadable http://www.dlib.org/dlib/april01/maly/04maly.html exploration

60 DP9 [Old Dominion U] provides entry page to repositories for web- crawlers provides bookmarkable URL for OAI record provides resolution of OAI identifier into metadata software downloadable exploration

61 http://www.openarchives.org openarchives@openarchives.org

62 Emergency Backup Slides

63 resumptionToken harvester RDBMS ListRecords Records 1-100, resumptionToken=AXad31 ListRecords, resumptionToken=AXad31 Records 101-200, resumptionToken=pQ22-x ListRecords, resumptionToken=pQ22-x Records 201-277 scenario: harvesting 277 records in 3 separate 100 record “chunks”

64 Open Archives InitiativeOpen Archival Information System http://www.dlib.org/dlib/april01/04editorial.html http://www.dlib.org/dlib/may01/05letters.html http://ssdoo.gsfc.nasa.gov/nost/isoas/us/overview.html exposure of metadata for harvesting insuring long-term preservation of archival materials OAIS OAIS w/ an OAI interface

65 Field of Dreams It should be easy to be a data provider, even if it makes more work for the service provider. –if enough data providers exist, the service providers will come (DPs >> SPs) Open-source / freely available tools –“drop-in” data providers: industrial strength: http://www.eprints.org/ personal size: http://kepler.cs.odu.edu/ –tools to make your existing DL a data provider: http://www.openarchives.org/tools/tools.htm also: OAI-implementers mailing list / mail archive! –service providers: only bits and pieces currently publicly available...


Download ppt "New Developments in OAI Michael L. Nelson Old Dominion University OA-Forum May 13-14, 2002 Pisa, Italy Many."

Similar presentations


Ads by Google