Presentation is loading. Please wait.

Presentation is loading. Please wait.

Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data

Similar presentations


Presentation on theme: "Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data"— Presentation transcript:

1 Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data mike@indexdata.com

2 Delivering MARCXML using SRW/UMike Taylor, Index Data Overview Where we're headed in the next half-hour: Existing standards for library catalogues The new XML equivalents of these standards Providing XML access to existing catalogues Two services running from two databases Two services running from a single database New gateway running over the existing service The Library of Congress's solution

3 Delivering MARCXML using SRW/UMike Taylor, Index Data Existing standards for catalogues The value of existing standards is well understood: MARC (MAchine Readable Catalogue) records ISO 2709 (interchange format for MARC) ANSI/NISO Z39.50 (search and retrieve on the Internet) These standards allow interoperability and co-operation between libraries that other fields can only dream about. (Librarians don't know how lucky they are!)

4 Delivering MARCXML using SRW/UMike Taylor, Index Data Z39.50 for searching catalogues Library of Congress Z39.50 server Z39.50 client Z39.50 (fetching MARC records)

5 Delivering MARCXML using SRW/UMike Taylor, Index Data Library of Congress Z39.50 server Z39.50 client Z39.50 British Library Z39.50 server Z39.50 for searching catalogues

6 Delivering MARCXML using SRW/UMike Taylor, Index Data Library of Congress Z39.50 server Z39.50 client Z39.50 British Library Z39.50 server Local catalogue Z39.50 server Z39.50 for searching catalogues

7 Delivering MARCXML using SRW/UMike Taylor, Index Data Library of Congress Z39.50 server Metasearching Z39.50 client Z39.50 British Library Z39.50 server Local catalogue Z39.50 server Z39.50 Z39.50 for searching multiple catalogues

8 Delivering MARCXML using SRW/UMike Taylor, Index Data Trouble in paradise Then the serpent saith unto Adam, Lo, why doth thy catalogue service not use XML? And Adam saith, Verily, Z39.50 worketh just fine. But the serpent, who was subtle of tongue, saith unto him, But XML is more fashionable. And, behold, Adam was deceived, and did fall. -- The Book of Standards, ch. 3, v. 4-6.

9 Library of Congress Z39.50 server Metasearching Z39.50 client Z39.50 British Library Z39.50 server Local catalogue Z39.50 server Z39.50 Delivering MARCXML using SRW/UMike Taylor, Index Data Welcome to the 21 st Century Everything must be XML

10 Library of Congress Z39.50 server Metasearching Z39.50 client Z39.50 British Library Z39.50 server Local catalogue Z39.50 server Z39.50 Delivering MARCXML using SRW/UMike Taylor, Index Data Welcome to the 21 st Century Resistance is useless!

11 Delivering MARCXML using SRW/UMike Taylor, Index Data Catalogue standards in an XML world The binary USMARC format is superseded by MARCXML. As many of the original developers of Dublin Core were Americans, various parochial national standards were referenced. This will hopefully get fixed with the belated discovery of the rest of the planet. (Unattributed, sadly.) Enter MarcXchange, a MARCXML superset that can represent all the national MARC formats (DANMARC, etc.) (Though repairing MARCXML might have been better.)

12 Delivering MARCXML using SRW/UMike Taylor, Index Data Catalogue standards in an XML world The binary Z39.50 protocol is superseded by SRU. (Search/Retrieve by Url). This is a NISO-registered standard for expressing queries using rich URLs, to obtain XML responses that contain records matching the query. http://sru.miketaylor.org.uk/sru.pl? version=1.1& operation=searchRetrieve& query=dinosaur& startRecord=1& maximumRecords=1& recordSchema=dc

13 Delivering MARCXML using SRW/UMike Taylor, Index Data An SRU response (single DC record) 1.1 29 info:srw/schema/1/dc-v1.1 xml 1 <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> Fossils Lappi, Megan. text New York, NY: Weigl Publishers 2005 en Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints http://www.loc.gov/catdir/toc/ecip0415/2004004136.html URN:ISBN:1590362136

14 Delivering MARCXML using SRW/UMike Taylor, Index Data An SRU response (single DC record) 1.1 29 info:srw/schema/1/dc-v1.1 xml 1 <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> Fossils Lappi, Megan. text New York, NY: Weigl Publishers 2005 en Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints http://www.loc.gov/catdir/toc/ecip0415/2004004136.html URN:ISBN:1590362136

15 Delivering MARCXML using SRW/UMike Taylor, Index Data An SRU response (single DC record) 1.1 29 info:srw/schema/1/dc-v1.1 xml 1 <srw_dc:dc xmlns:srw_dc="info:srw/schema/1/dc-schema" xmlns="http://purl.org/dc/elements/1.1/"> Fossils Lappi, Megan. text New York, NY: Weigl Publishers 2005 en Studying fossils -- Fossil facts -- Gone forever -- A fossil is born -- From bone to stone -- Insects in amber -- Dinosaur footprints http://www.loc.gov/catdir/toc/ecip0415/2004004136.html URN:ISBN:1590362136

16 Delivering MARCXML using SRW/UMike Taylor, Index Data SRU's big brother: SRW SRU works by fetching rich URLs. SRW (Search/Retrieve Webservice) works over SOAP. In theory, SRW is more powerful and flexible than SRU. In practice, it is hard to implement and runs more slowly. It is still important because many Big Players (Microsoft, IBM, etc.) have a big investment in SOAP. However, most implementations have used SRU. With HTTP/1.1 persistent connections, performance is fine.

17 Delivering MARCXML using SRW/UMike Taylor, Index Data SRU's query language: CQL CQL (Common Query Language) is used by SRU and SRW. It may also be used in other contexts (including Z39.50). Its syntax is easy to learn, but very expressive. dinosaur title=dinosaur title=(dinosaur or pterosaur) and author=martill dc.title=*saur and dc.author=martill title exact "the complete dinosaur" and date < 2000 name=/phonetic "smith" fish prox/distance<3/unit=sentence frog

18 Delivering MARCXML using SRW/UMike Taylor, Index Data Now what? We have: A mature, functional infrastructure based on MARC and Z39.50 A world out there that is comfortable with XML-based technology An XML-based equivalent of MARC (MARCXML/MarcXchange) An XML-based equivalent of Z39.50 (SRU) But we don't have Actual running SRU servers that deliver MARCXML records. Can we get there from here?

19 Delivering MARCXML using SRW/UMike Taylor, Index Data Server providers don't want to switch Library of Congress SRU server Z39.50 client Z39.50 Uh-oh!

20 Delivering MARCXML using SRW/UMike Taylor, Index Data Client applications don't want to switch Library of Congress Z39.50 server SRU client SRU Uh-oh!

21 Delivering MARCXML using SRW/UMike Taylor, Index Data Transition period: run both services Library of Congress Z39.50 server Z39.50 client Z39.50 Library of Congress SRU server SRU client SRU

22 Delivering MARCXML using SRW/UMike Taylor, Index Data Transition period: run both services This approach gives client applications a choice: Existing client applications continue to work New applications can be built using new technology This flexibility comes at a cost to the service providers, who have to provide not one but two services. How can they do this? There are three approaches.

23 Delivering MARCXML using SRW/UMike Taylor, Index Data The two-database approach Library of Congress Z39.50 server Library of Congress SRU server MARCXML database MARC database Proprietary API

24 Delivering MARCXML using SRW/UMike Taylor, Index Data Why the two-database approach sucks The two-database has the advantage of conceptual and operational simplicity. The two separate systems can be maintained by separate teams. However: THE TWO DATABASES HAVE TO BE KEPT SYNCHRONISED. At best this entails duplication of effort. At worst, it fails completely, and a record fetch from one database may be different from the same record fetched from the other database. (If it exists at all.)

25 Delivering MARCXML using SRW/UMike Taylor, Index Data The one-database-two-services approach Library of Congress Z39.50 server Library of Congress SRU server MARC database Proprietary API

26 Delivering MARCXML using SRW/UMike Taylor, Index Data Advantages of the 1D2S approach When both services use data from the same database, only one copy of the database has to be maintained. This approach has several advantages:

27 Delivering MARCXML using SRW/UMike Taylor, Index Data Advantages of the 1D2S approach When both services use data from the same database, only one copy of the database has to be maintained. This approach has several advantages: Eliminates duplication

28 Delivering MARCXML using SRW/UMike Taylor, Index Data Advantages of the 1D2S approach When both services use data from the same database, only one copy of the database has to be maintained. This approach has several advantages: Eliminates duplication Reduces redundancy

29 Delivering MARCXML using SRW/UMike Taylor, Index Data Advantages of the 1D2S approach When both services use data from the same database, only one copy of the database has to be maintained. This approach has several advantages: Eliminates duplication Reduces redundancy

30 Delivering MARCXML using SRW/UMike Taylor, Index Data Advantages of the 1D2S approach When both services use data from the same database, only one copy of the database has to be maintained. This approach has several advantages: Eliminates duplication Reduces redundancy Eliminates duplication

31 Delivering MARCXML using SRW/UMike Taylor, Index Data The horrible truth Library of Congress Z39.50 server Library of Congress SRU server Proprietary database No API! When the database (and Z39.50 server) are part of an integrated proprietary system, the SRU server runs into a brick wall. Opaque black box

32 Delivering MARCXML using SRW/UMike Taylor, Index Data The solution Library of Congress Z39.50 server Library of Congress SRU server Proprietary database Z39.50 IS the API! Black box with a little hole

33 Delivering MARCXML using SRW/UMike Taylor, Index Data Why this is so cute When the SRU server uses Z39.50 as its API to the database, it is an SRU-to-Z39.50 gateway. Its front-end is an SRU server and its back-end is a Z39.50 client. This rocks because: No duplication of data is necessary No co-operation is necessary from the existing software Use of the standard Z39.50 protocol as the API to the database means that THE SAME GATEWAY can be used to provide SRU access to ANY CATALOGUE that is already available via Z39.50.

34 Delivering MARCXML using SRW/UMike Taylor, Index Data A novel application of Z39.50 Z39.50 is most often used to allow a client to query a remote server. Here we are using it as a tightly integrated part of a locally provided service -- the gateway will typically run on the same machine as the Z39.50 server, or on a nearby machine on the same LAN. HOWEVER, because Z39.50 is a network API rather than a link-time API, other interesting arrangements are possible.

35 Delivering MARCXML using SRW/UMike Taylor, Index Data Typical architecture: integrated SRU Library of Congress Z39.50 server Library of Congress SRU server Proprietary database SRU client SRU Opaque black box

36 Delivering MARCXML using SRW/UMike Taylor, Index Data Alternative architecture: 3 rd party SRU 3 rd party service SRU server Library of Congress Z39.50 server Proprietary database SRU client SRU Running in England Running in USA Denmark Opaque black box

37 Delivering MARCXML using SRW/UMike Taylor, Index Data What's it like? SRU client software neither knows nor cares that the server it is connected to is really a gateway. Application user knows nothing about the Z39.50 database. You might expect that performance would degrade due to the additional step. In practice, with a high-quality gateway, performance of the SRU server greatly exceeds that of the underlying Z39.50 server.

38 Delivering MARCXML using SRW/UMike Taylor, Index Data What's it like? SRU client software neither knows nor cares that the server it is connected to is really a gateway. Application user knows nothing about the Z39.50 database. You might expect that performance would degrade due to the additional step. In practice, with a high-quality gateway, performance of the SRU server greatly exceeds that of the underlying Z39.50 server. (This is done using magic.)

39 Delivering MARCXML using SRW/UMike Taylor, Index Data The Library of Congress's solution The Library of Congress contracted Index Data (that's us) to build an SRU-to-Z39.50 gateway for them. Having built it, we released it under an Open Source licence, (the GNU General Public Licence) The LC SRU server is available to anyone at: http://z3950.loc.gov:7090/Voyager The gateway is freely available to download at: http://indexdata.com/yazproxy/

40 Delivering MARCXML using SRW/UMike Taylor, Index Data (Digression: why is it called YAZ Proxy?) YAZ is our battle-tested and widely deployed Z39.50 toolkit. (It powers 2/3 of all Z39.50 clients and servers worldwide.) YAZ Proxy is so called because it acts as a Z39.50-to-Z39.50 gateway as well as SRU-to-Z39.50 (and SRW-to-Z39.50). Why would you want a Z39.50 proxy? For the same reasons you want a Web proxy such as Squid: Reduce load on the underlying server Improve client performance through caching Protect fragile back-end by sanitising client requests Balance load over multiple back-end servers

41 Delivering MARCXML using SRW/UMike Taylor, Index Data What YAZ Proxy does For each SRU Search Request that it receives, YAZ Proxy: Translates the CQL query into a Z39.50 Type-1 query Embeds the translated query in a Z39.50 Search Request Sends the request to the back-end server (Asynchronously) awaits the Z39.50 Search Response Extracts the MARC records from the response Converts them into MARCXML Embeds the converted records in an SRU Search Response Returns the response to the client All this is transparent to the SRU client and the Z39.50 server.

42 Delivering MARCXML using SRW/UMike Taylor, Index Data The sauropod dinosaur Brachiosaurus (It's been a while since we had a picture.)

43 Delivering MARCXML using SRW/UMike Taylor, Index Data YAZ Proxy in detail: performance features Access to the LC catalogue -- whether by Z39.50 or SRU -- is much faster through YAZ Proxy than directly. YAZ Proxy re-uses a pool of initialised back-end sessions It can pre-cache a set of ready-to-use back-end sessions Query-caching avoids repeated identical searches Record-caching allows repeated requests for the same record to be instantaneous The total effect is that access via YAZ Proxy is typically 10-100 times faster. (Source: Larry Dixson of the Library of Congress.)

44 Delivering MARCXML using SRW/UMike Taylor, Index Data YAZ Proxy in detail: load balancing YAZ Proxy can be configured to balance load across multiple back-end Z39.50 servers. Queries are generally sent to the least heavily loaded back-end. This allows a heavily-used service to be scaled across multiple servers, distributed and made robust against system failure. (Arrangements must be made to keep the multiple copies up to date and synchronised.)

45 Delivering MARCXML using SRW/UMike Taylor, Index Data YAZ Proxy in detail: query translation Both CQL and the Z39.50 Type-1 query allow application-specific extensions (e.g. geospatial searching, thesaurus navigation). Translation from CQL to Type-1 is therefore driven by a simple configuration file which maps CQL index-names, relations, etc. into Z39.50 Type-1 query attributes. index.cql.serverChoice = 1=1016 index.rec.id = 1=12 index.dc.title = 1=4 index.dc.subject = 1=21 relation.< = 2=1 relation.le = 2=2 relationModifier.relevant = 2=102

46 Delivering MARCXML using SRW/UMike Taylor, Index Data YAZ Proxy in detail: record translation Translating MARC (ISO2709) records into MARCXML is a core function of YAZ Proxy. It can also be configured to further transform the translated MARCXML records using arbitrary XSLT stylesheets. Standard stylesheets support translation to Dublin Core MODS METS Other formats, such as OAI_DC, are easy to support.

47 Delivering MARCXML using SRW/UMike Taylor, Index Data But, Mike! This is too good to be true! Yes.

48 Delivering MARCXML using SRW/UMike Taylor, Index Data But how do you people make a living? Apart from living on good karma, we make money from: Bespoke development (e.g. building YAZ Proxy) Customisation (e.g. adding support for new XML formats) Integration (e.g. making the proxy use local authentication) Support contracts (but these are strictly optional) Consultancy We also provide services such as hosted SRU-to-Z39.50 gateways, so YOUR ORGANISATION could support SRU (and SRW) access, and accelerate its Z39.50 service, without requiring you to install any software.

49 Thanks for listening! You know where to find us. http://indexdata.com/ Tel. +45 3341 0100 Fax. +45 3341 0101 Delivering MARCXML using SRW/UMike Taylor, Index Data


Download ppt "Delivering MARC/XML records from the Library of Congress catalogue using the open protocols SRW/U and Z39.50 Mike Taylor, Index Data"

Similar presentations


Ads by Google