Presentation is loading. Please wait.

Presentation is loading. Please wait.

Extracting XML from Unicorn with OAI and SRU European Unicorn User Group Conference Glasgow Caledonian University September 7th & 8th, 2006 Benoit PAUWELS.

Similar presentations


Presentation on theme: "Extracting XML from Unicorn with OAI and SRU European Unicorn User Group Conference Glasgow Caledonian University September 7th & 8th, 2006 Benoit PAUWELS."— Presentation transcript:

1 Extracting XML from Unicorn with OAI and SRU European Unicorn User Group Conference Glasgow Caledonian University September 7th & 8th, 2006 Benoit PAUWELS Université Libre de Bruxelles (ULB) Brussels

2 Agenda Introduction – Unicorn interfaces Part 1: An OAI frontend for Unicorn Part 2: An SRU frontend for Unicorn –Short description of OAI and SRU protocols –Overview of technical implementation –Use cases and demos

3 Introduction OAI and SRU are ‘open’ protocols that permit exchange of metadata between information systems Well-known Unicorn interfaces: –Unicorn API server –Unicorn Webcat/iBistro/iLink server –Unicorn Z39.50 server All comply to the philosophy of request/response sequences

4 Client systemUnicorn server Catalogue database [ Records and indexes ] TCPIP/Socket API request TCPIP/Socket API response API datacodes/values API server Unicorn interfaces: API server SirsiDynix Character client C Workflows client Java Themes client Communication protocolTCPIP/Socket Information exchange protocolproprietary SirsiDynix API requests/responses Returned record structureproprietary SirsiDynix format (data-codes and -values)

5 Client systemUnicorn server Catalogue database [ Records and indexes ] HTTP iLink request (URL) HTTP HTML page HTML iLink Unicorn interfaces: iLink Any Web browser Communication protocolHTTP Information exchange protocolURL requests / HTML responses Returned record structureHTML Web Server

6 Client systemUnicorn server Catalogue database [ Records and indexes ] Z39.50 Z39.50 request Z3950 Z3950 response MARC21 Z39.50 Unicorn interfaces: Z39.50 Any Z3950 client Communication protocolZ39.50 specific Information exchange protocolZ39.50 specific Returned record structuretypically MARC21

7 Unicorn interfaces API: Proprietary –low interoperability level HTML: Record data not well structured –low reusability level Z39.50: Protocol specific –more difficult to implement (high learning curve) –Z39.50 is statefull  Difficult to integrate into today’s web services environments  communication: use HTTP  information exchange: use open protocols (like OAI and SRU)  record data structure: use XML (according to well-defined XML Schema)

8 2 new Unicorn interfaces HTTP / Open / XML OAI-PMH: Open Archives Initiative – Protocol for Metadata Harvesting SRU: Search and Retrieve via URL

9 Service ProviderData Provider Document Archive HTTP embedded OAI requests HTTP embedded OAI responses OAI Frontend OAI-PMH : the protocol Web Server

10 OAI-PMH: the protocol ‘Harvester collects metadata from archives’ Stateless protocol: sequence of OAI requests/responses over HTTP Just harvesting -- NOT searching

11 OAI-PMH: the protocol OAI requests HTTP GET|POST requests Syntax –BASE URL host + port + path of OAI request handler –key=value pairs Examples: –http://www.cible.ulb.ac.be:80/ cgi-bin/OAI20/catalog? verb=Identify __ –http://www.biomedcentral.com/ oai/1.1/bmcoai.asp? verb=GetRecord&identifier=oai:bmc: &metadataPrefix=oai_dc

12 OAI-PMH: the protocol OAI responses XML encoded bytestreams, containing the records Record = triplet –header (unique OAI identifier) –metadata –about Metadata schemes –XML Schema –Minimum: unqualified Dublin Core –Community specific Example of a record ( catkey from ULB catalogue ): –oai_dc marc21 umodsoai_dcmarc21umods

13 OAI-PMH: the protocol Simple : 6 OAI requests/responses Identify –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=Identify __ ListMetadataFormats [identifier] –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=ListMetadataFormats _ _ ListSets –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog?verb=ListSets __ GetRecord identifier, metadataPrefix –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=GetRecord&identifier=oai:ulbcat:245000&metadataPrefix=marc21 __

14 OAI-PMH: the protocol Simple : 6 OAI requests/responses ListRecords metadataPrefix, [from,until,set] –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=oai_dc __ –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=mhld21&set=elper __ –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListRecords&metadataPrefix=marc21&from= __ ListIdentifiers metadataPrefix, [from,until,set] –http://www.cible.ulb.ac.be/cgi-bin/OAI20/catalog? verb=ListIdentifiers&metadataPrefix=oai_dc __

15 OAI frontend for Unicorn Implementation of the data provider functionality (2001) pick a template and interface with Unicorn through Unicorn database toolshttp://www.openarchives.org/tools/tools.html Our choice: Object Oriented Perl frontend (H. Suleman – Virginia Tech) __

16 OAI frontend for Unicorn HTTP embedded OAI request Unicorn Server HTTP server Unicorn database CGI OAI C wrapper fork in ‘sirsi’ environment OAI.pl call the appropriate OAI request handler retrieve metadata from Unicorn database format in XML HTTP embedded OAI response

17 OAI frontend for Unicorn Example: implementation of the GetRecord request verb=GetRecord&identifier=oai:ulbcat:245000&metadataPrefix=oai_dc 1. Get metadata from Unicorn for catkey $record = `echo $catkey | catalogdump -of | filtermarc -iALL -od -Ds`; _ = split(‘\|’,`echo $catkey | selcatalog -iK - opr`); 2. Convert ANSEL character set into ISO-LATIN-1 3. Map from MARC to oai_dc __ 4. Format into XML

18 OAI frontend for Unicorn Example: implementation of the ‘set’ parameter of the ListRecords request verb=ListRecords&metadataPrefix=oai_dc&set=elper Precompile set as a file of catkeys –name of file: « name of set_catkeys » einstein_albert_catkeys elper_catkeys sd_catkeys all_catkeys –through periodic execution of « mkoaisets » custom report

19 OAI frontend for Unicorn Example: implementation of the ‘from/until’ parameters of the ListRecords request verb=ListRecords&metadataPrefix=oai_dc&from= &until= BRS index on creation/modification date? Every Unicorn record that gets created or modified is ‘touched’ in the ‘textedit’ and ‘browsedit’ directories Custom report ‘cadutext’ – saves catkeys to /Savedkeys/adutext/rptid – adds line ‘rptid|date|status’ to /Lastruns/cadutext Example: « from= &until= » –obtain report ids for all runs of cadutext after and before from the file /Lastruns/cadutext –for each of these report ids: obtain catkeys from /Savedkeys/adutext/rptid and save them to randomnumber_catkeys file –sort and uniq the randomnumber_catkeys file

20 OAI frontend for Unicorn Limitations of implementation: –ListRecords/ListIdentifiers: The from and until parameters are not permitted if the set parameter is given on the request The from and until parameters are permitted if the set parameter is not given on the request, but their values should fall within a certain date range (at this moment arbitrarily set to ‘today - 2 months’ and ‘today’) –Deleted records Complete source code and documentation available on the API Repository (http://sirsiapi.org)http://sirsiapi.org

21 OAI frontend - use ULB Use case 1: Vlink - OpenURL resolver system joint project with Vrije Universiteit Brussel (VUB) ULB iLink JSTOR ISI Web of Science Elsevier ScienceDirect OVID WebSpirs HTML extended services OpenURL Vlink knowledge base

22

23 OAI frontend - use ULB Use case 1: Vlink - OpenURL resolver system OpenURL sent from iLink sid=ULB:Webcat&id=oai:ulbcat: This OpenURL does not contain enough metadata for the specific item ==> Vlink does a fetch back to Unicorn through an OAI GetRecord request to obtain a full MARC21 bibliographic description verb=GetRecord&identifier=oai:ulbcat:617924&metadataPrefi x=marc21

24 OAI frontend - use ULB Use case 1: Vlink - OpenURL resolver system Feed Vlink Knowledge Base through OAI harvesting VLink Vlink Knowledge Base Unicorn OAI-PMH verb=ListRecords&metadataPrefix=mhld21&set=elper

25 OAI frontend - use ULB Use case 2: Unicat - Virtual Union Catalog of Belgium University library Catalog Unicorn Aleph VIRTUA VUBIS End User Unicat WWW Gateway Unicat Indexer Unicat Harvester Search/ Browse indexes Union OAI Archive OAI SRU Public Museum Other OAI Central RepositoryData providers HTML

26 Client SystemUnicorn Server SRU Frontend SRU : the protocol Web Server Catalogue database [ Records and indexes ] HTTP SRU request HTTP SRU response XML Communication protocolHTTP Information exchange protocolSRU Returned record structureXML

27 SRU: the protocol ‘Client searches and retrieves metadata records from an archive’ Stateless protocol: sequence of SRU requests/responses over HTTP Search and Retrieve ( OAI: harvesting)

28 SRU: the protocol SRU requests HTTP GET requests Syntax –BASE URL host + port + path of SRU request handler –key=value pairs 3 possible requests (operations) –explain serves to record facilities available at an SRU server used by clients to self-configure returned explain record is in XML and follows the ZeeRex Schema Example: __ –scan allows the client to request a range of the available terms at a given point within a list of indexed terms enables clients to present an ordered list of values and, if supported, how many hits there would be for a search on that term –searchRetrieve

29 SRU: the protocol searchRetrieve operation searchRetrieve (principal) parameters –Version: (of the request); current protocol version: 1.1 –query: query expressed in CQL –startRecord: position within the sequence of matched records of the first record to be returned –maximumRecords: number of records requested to be returned –recordSchema: schema requested for the records to be returned –stylesheet: URL for an xml stylesheet. The client requests that the server simply return this URL in the response. CQL « Traditionally, query languages have fallen into two camps: Powerful, expressive languages, not easily readable nor writable by non-experts (e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful enough to express complex concepts (e.g. CCL and google). CQL tries to combine simplicity and intuitiveness of expression for simple, every day queries, with the richness of more expressive languages to accomodate complex concepts when necessary. » (http://www.loc.gov/standards/sru/cql)

30 SRU: the protocol searchRetrieve operation Examples of CQL queries: dinosaur title = "complete dinosaur" title exact "the complete dinosaur" dinosaur not reptile dinosaur and bird or dinobird publicationYear < 1980 title all "complete dinosaur" title contains all of the words: ‘complete’, and ‘dinosaur’ title any "dinosaur bird reptile" title contains any of the words: ‘dinosaur’, ‘bird’, or ‘reptile’ ribs prox/distance<=5 chevrons a more specific proximity query: ‘ribs’ within 5 words of ‘chevrons’

31 SRU: the protocol searchRetrieve operation -- examples &query=author=einstein __ &maximumRecords=10&startRecord=1&query=author=einstein __ &maximumRecords=10&startRecord=1&query=author=einstein&recordSche ma=dc __ &maximumRecords=10&startRecord=1&query=author all "einstein albert“ _ _ &maximumRecords=10&startRecord=1&query=title all "einstein albert“ __ &maximumRecords=10&startRecord=1&query=title all "einstein albert“&stylesheet=http://bib49.ulb.ac.be/cibleCanevas.xsl __ &maximumRecords=10&startRecord=1&query=title all "einstein albert“&stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl __

32 SRU frontend for Unicorn Unicorn Server SRU FrontendWeb Server Catalogue database [ Records and indexes ] HTTP SRU request HTTP SRU response XML Client system

33 SRU frontend for Unicorn Unicorn Server Z39.50 Frontend Web Server Catalogue database [ Records and indexes ] HTTP SRU request HTTP SRU response XML SRU/Z39.50 Gateway SRU/Z39.50 Z3950 Z3950 request Z3950 Z3950 response Client system

34 SRU frontend for Unicorn SRU/Z39.50 Gateway: YAZ Proxy (Index Data)Index Data –Implemented at ULB: 7/2006 (2 days) –config.xml bib7.ulb.ac.be:2200 pqf.properties velma.library.mun.ca:2200 pqf.slavko.properties –explain.xml ZeeRex XML record as response to ‘explain’ operation –pqf.properties specifies the mapping of various CQL indexes, relations, etc. into Type-1 query attributes

35 SRU frontend for Unicorn YAZ Proxy –http://bib49.ulb.ac.be:9000/Cible? version=1.1&operation=searchRetrieve&maximumRecords=10&s tartRecord=1& query=title all "einstein albert“& stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl __ –http://bib49.ulb.ac.be:9000/Slavko? version=1.1&operation=searchRetrieve&maximumRecords=10&s tartRecord=1& query=title all "einstein albert“& stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl __

36 SRU frontend : use ULB Seamless integration of catalog searches in CMS Typo3 Example –HTML page containing biography of famous belgian historian Henri Pirenne –frame pointing to the following URL: version=1.1&operation=searchRetrieve&maximumRecords=10&startRe cord=1& query=pirenne%20and%20epub-dnu-* &stylesheet=http://bib49.ulb.ac.be/cibleTypo3.xsl Project –Unicorn contains descriptions of databases, websites, etc with local thematic classification codes in 653 –create thematic websites within our CMS, containing frames that list available databases per theme


Download ppt "Extracting XML from Unicorn with OAI and SRU European Unicorn User Group Conference Glasgow Caledonian University September 7th & 8th, 2006 Benoit PAUWELS."

Similar presentations


Ads by Google