Searching Digital Content via SRU Ryan Scherle Randall Floyd October 25, 2006.

Slides:



Advertisements
Similar presentations
Richard Jones, Systems Developer Technical Issues for Repository Software Theses Alive! Edinburgh University Library SHERPA Nottingham.
Advertisements

Putting the Pieces Together Grace Agnew Slide User Description Rights Holder Authentication Rights Video Object Permission Administration.
IRRA DSpace April 2006 Claire Knowles University of Edinburgh.
Theo van Veen, Koninklijke Bibliotheek The European Library: opportunities for new services.
SRW/U for DSpace Ralph LeVan Research Scientist. What is SRW/U A Pair of HTTP-based Text Query Protocols – SRW: Search and Retrieve Web Service – SRU:
Z39.50 as a Web Service Ralph LeVan Research Scientist.
OCLC Online Computer Library Center SRW & DSpace Ralph LeVan OCLC Research.
A centre of expertise in digital information management UKOLN is supported by: SRU: An overview of the SRU protocol and how it can be used.
WikiD (Wiki/Data) Jeffrey A. Young OCLC Office of Research Distributed Service Registry Workshop Warwick, UK 14 July 2005.
A REST-ful Web Services Approach to Library Federated Search using SRU Kevin Reiss Rutgers-Newark Law Library CALI 2005 – June 11th.
Ray Denenberg Ralph LeVan Interoperability Standards & Searching Multiple Repositories Workshop 20 March 25, 2006; Washington.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
Depositing e-material to The National Library of Sweden.
The KB on its way to Web 2.0 Lower the barrier for users to remix the output of services. Theo van Veen, ELAG 2006, April 26.
Information Retrieval in Practice
Introducing Symposia : “ The digital repository that thinks like a librarian”
Overview of Search Engines
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Metadata Standards and Applications 4. Metadata Syntaxes and Containers.
Why create a Gandhara What is it expected to do that the library catalog is not doing? What other benefits can it offer to users? Think of Gandhara as.
Metadata Harvesting The Hague, 13 & 14 January 2009 Julie Verleyen Scientific Coordinator, Europeana Office EuropeanaLocal Knowledge Sharing Workshop.
OCLC Online Computer Library Center CONTENTdm ® Digital Collection Management Software Ron Gardner, OCLC Digital Services Consultant ICOLC Meeting April.
“Old Style” Libraries, Digital Libraries: Convergences, Divergences, And the Troubles in Between.
Z39.50 for Finding It All William E. Moen School of Library and Information Sciences Texas Center for Digital Knowledge University of North Texas Denton,
Implementing an Integrated Digital Asset Management System: FEDORA and OAIS in Context Paul Bevan DAMS Implementation Manager
Lucas Mak and Dao Rong Gong Michigan State University Millennium and XML: Repurposing and Customizing Metadata May , 2009.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
ALCME: OAI at OCLC Jeffrey A. Young OCLC Online Computer Library Center, Inc.
Indo-US Workshop, June23-25, 2003 Building Digital Libraries for Communities using Kepler Framework M. Zubair Old Dominion University.
PHOTO CATALOGING AND DELIVERY SERVICE INTRODUCTION AND GETTING STARTED.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Linking electronic documents and standardisation of URL’s What can libraries do to enhance dynamic linking and bring related information within a distance.
Fedora and GSearch in a Research Project about Integrated Search Open Repositories 2009 Gert Schmeltz Pedersen DTU Library, Technical Information Center.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
PLoS ONE Application Journal Publishing System (JPS) First application built on Topaz application framework Web 2.0 –Uses a template engine to display.
Nate Trail Network Development & MARC Standards Office 8/1/2006 With help from Sydney Olive How to Build, Display and Find METS Objects.
FlexElink Winter presentation 26 February 2002 Flexible linking (and formatting) management software Hector Sanchez Universitat Jaume I Ing. Informatica.
Introduction to Web Services Eric Lease Morgan University Libraries of Notre Dame June 24, 2005.
1 CS 502: Computing Methods for Digital Libraries Lecture 19 Interoperability Z39.50.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
METS Navigator Jenn Riley John Walsh Michelle Dalmau David Jiao Indiana University Digital Library Program Digital Library Federation Spring Forum
Introduction to metadata
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Ray Denenberg Rob Sanderson “ Key Standards Updates ” SRU Project Briefing April 4, 2006; Washington.
Copyright © 2012 UNICOM Systems, Inc. Confidential Information z/Ware Product Overview illustro Systems International A Division of UNICOM Global.
OAI Overview DLESE OAI Workshop April 29-30, 2002 John Weatherley
GBIF Data Access and Database Interoperability 2003 Work Programme Overview Donald Hobern, GBIF Programme Officer for Data Access and Database Interoperability.
A Multi-Tiered Architecture for Distributed Data Collection and Centralized Data Delivery Stacy Kowalczyk and James Halliday April 28, 2008.
Metadata “Data about data” Describes various aspects of a digital file or group of files Identifies the parts of a digital object and documents their content,
Challenges in the Nursery: Linking a Finding Aid with Online Content Elizabeth Johnson, Lilly Library Jenn Riley, Digital Library Program DL Brown Bag,
SRW/U: Re-Introduction SRW is a Web Services based Information Retrieval Protocol Motivations: Create an easy to implement protocol with the power of Z39.50.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Sharing Digital Scores: Will the Open Archives Initiative Protocol for Metadata Harvesting Provide the Key? Constance Mayer, Harvard University Peter Munstedt,
The library is open Digital Assets Management & Institutional Repository Russian-IUG November 2015 Tomsk, Russia Nabil Saadallah Manager Business.
Steven Perry Dave Vieglais. W a s a b i Web Applications for the Semantic Architecture of Biodiversity Informatics Overview WASABI is a framework for.
A technical overview Image Collection Workflow and Tools Michael Durbin 2010 Brown Bag Presentation Series April 21, 2010.
Z39.50 and the ZING Initiatives: MAVIS Users Conference, 2003 November 6, 2003 Larry E. Dixson Library of Congress.
Copyright 2007, Information Builders. Slide 1 iWay Web Services and WebFOCUS Consumption Michael Florkowski Information Builders.
Introduction to metadata for IDAH fellows Jenn Riley Metadata Librarian Digital Library Program.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
Virtual Collections VIRTUAL COLLECTIONS LDI Architecture Meeting, Tuesday, July 19.
JAFER Toolkit Project Oxford University 1 JAFER Java-based high level Z39.50 toolkit Matthew Dovey; Colin Tatham; Antony Corfield; Richard Mawby Oxford.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Web Services Overview Thomas Hickey. 2 What are Web Services? Machine-to-machine communication Run over standard Web protocols –XML syntax, HTTP packaging.
Information Retrieval in Practice
Building Search Systems for Digital Library Collections
IDEALS at the University Of Illinois: A Case Study of Integration Between an IR and Library Discovery Systems Sarah L. Shreeves University of Illinois.
Presentation transcript:

Searching Digital Content via SRU Ryan Scherle Randall Floyd October 25, 2006

Overview What is SRU, and why do we need it? SRU in practice SRU in the DLP Search and browse interfaces to SRU Going forward…

What is SRU? Search/Retrieve via URL The game of the name:  ZNG, ZiNG  SRW/U  SRU  SRU POST  SRU over SOAP  CQL

Why standardize search? Makes locating content easier for end users Increases visibility of existing digital content Eases implementation of new systems, features Search engines that follow standards can be combined in many ways

What about…. Z39.50:  Complicated (eighteen native and extended services vs. one)  Requires connection-based sessions  Binary encoding transmitted directly over TCP/IP OAI-PMH:  Harvesting vs. searching  Focused on metadata only, not text  Can be used in conjunction with SRU Google, OpenSearch:  Very simple query syntax

Benefits of SRU Web services simplify communication Builds on the experience of the Z39.50 community, but sheds Z39.50’s complexity Allows for any type of underlying search engine:  Simple grep applied against a text file  Direct access to a relational database  An index created from search engine software, such as Swish-e or Lucene

Benefits of SRU Web services simplify communication Builds on the experience of the Z39.50 community, but sheds Z39.50’s complexity Allows for any type of underlying search engine:  Simple grep applied against a text file  Direct access to a relational database  An index created from search engine software, such as Swish-e or Lucene

Benefits of SRU Many institutions are implementing SRU:  Library of Congress  OCLC  Index Data  DSpace  Fedora There are SRU tools in many programming languages:  Perl  Python  C  Java  Ruby

Benefits of SRU Many institutions are implementing SRU:  Library of Congress  OCLC  Index Data  DSpace  Fedora There are SRU tools in many programming languages:  Perl  Python  C  Java  Ruby

Operations Explain – learn about the contents of the search system, available result formats Scan – learn about the usage of terms within the search system SearchRetrieve – retrieve documents that match a given query

Explain Describes the SRU Server Explain Response structure: Zeerex  Z39.50 explain, explained and re-engineered in XML  Basic administrative information  Fields to search  Result types Sample Explain record

Scan Allows listing of terms and their frequencies in the index Can be useful for browsing, zero results Could display a controlled vocabulary for a given collection Sometimes difficult to implement, but not required Sample Scan result

SearchRetrieve Submit a query, receive a response Any XML return type is allowed Result sets last for “a while” The simplest SRU client

SRU request parameters query startRecord maximumRecords recordSchema

SRU is simple, yet powerful Supports simple keyword queries, but allows much more complex queries to be specified. Complex queries may be formed by trained users or by task-specific interfaces. Default metadata type for returned records, but many types may be available.

Contextual Query Language CQL has two goals:  Be as simple as possible “Just do the right thing”  Allow powerful searching Some examples of “CQL 1.2” adapted from Ray Denenberg…

cat

(That’s it. The whole query.)

Simple CQL Queries cat (simplest) cat and dog (simple boolean) title = cat (index)

Simple CQL Queries cat (simplest) cat and dog (simple boolean) title = cat (index) dc.title = cat (index qualified)

Qualified indexes title = cat dc.title = cat dc.title="Focus on grammar" AND bib.titleSub="basic level"

Relations Indexes and relations come in pairs: cat title = cat

cat cql.scr (default context set and relation) scr: “server choice relation” cql.serverChoice (default index)

About the = relation = is the same as cql.scr title = “the complete dinosaur” means: find these three words, according to the server’s preferred equality metric (not necessarily as a phrase)

Other text Relations title = "the complete dinosaur" title all "complete dinosaur” title any "dinosaur bird reptile" title adj "the complete dinosaur"

All title all "complete dinosaur“ matches “the complete and unabridged dinosaur“ does not match “the unabridged dinosaur“

title all "old man sea" is the same as title="old" and title="man" and title="sea" All

Any title any “dinosaur bird reptile” does match “the complete dinosaur" and “the unabridged dinosaur"

title any "old man sea" is the same as title="old" or title="man" or title="sea" Any

Adj title adj "the complete dinosaur" matches "the complete dinosaur" and “Follow the complete dinosaur home"

More relations < less > greater <= less or equal >= greater or equal <> not equal

Relation modifiers  title =/stem "these completed dinosaurs“ matches  The Complete Dinosaur

Booleans and or not prox dc.title="cat" prox/distance=2 dc.title="hat"

Context sets CQL DC MARC Music Bib bib.namePersonal=/bib.role=author /bib.roleAuthority=marcrelator "George Orwell"

SRU around the world OAIster Library of Congress (dc, mods, marcxml) Library of Congress Rutgers Law Library UIUC OAI Registry The European Library and the Koninklijke Bibliotheek OCLC  Research publications Research publications  SRW registry SRW registry

SRU in the DLP New infrastructure based on Indexing for SRU search Interaction between the search system and the index The other side of the world: SRU clients

Search system architecture Ingest Query Processor Indexing (gSearch/XTF) Fedora SRU Search Engine Vocabulary Services X1X2 X3 X4 X5 X6 Q1 Q2 Q3 Q5 Q6 Web Services Q4 Lucene Indexes

The DLP SRU server The DLP SRU server is based on an implementation by OCLC:DLP SRU server Modular, can accommodate many databases of differing types Supports both REST (SRU) and SOAP (SRW) interactions Extended to support arbitrary Lucene indexes Configuration: cql.serverChoice = mods.keywords dc.title = dc.title dc.creator = mods.namePart dc.subject = mods.subject bib.classification = mods.classification iudl.collection = collectionID

SRU in the DLP Implementing SRU clients in search and browse interfaces

Search and Browse Interfaces Repository gives us a standardized way to store objects and metadata Still need a way to deliver collections online with minimal development  Past collection interfaces have been developed in a “one-off” manner

clown Process query terms Search and Retrieve Metadata Assemble and Format Metadata Display Results Main Search Controller

clown Process query terms Search and Retrieve Metadata Assemble and Format Metadata Display Results Main Search Controller

clown Process query terms Search and Retrieve Metadata Assemble and Format Metadata Display Results Main Search Controller

clown Process query terms Search and Retrieve Metadata Format Metadata Display Results Main Search Controller Descriptive Metadata Digital Objects

Search and Retrieve Metadata Format Metadata Main Search Controller Format metadata XML for display via XSL transformation Submit query to repository via standard protocol and syntax Receive results from repository in standard XML format Request an image by ID and size via a persistent URL

Search and Retrieve Metadata Format Metadata Main Search Controller Query SRU service via CQL Receive SRU XML response Request an image by ID and size via its PURL Format metadata XML for display via XSL transformation

Process query terms Search and Retrieve Metadata Format Metadata Display Results Main Search Controller Descriptive Metadata Digital Objects Ingest Query Processor Indexing (gSearch/XTF) Fedora SRU Search Engine Vocabulary Services Web Services Lucene Indexes

Process query terms Search and Retrieve Metadata Format Metadata Display Results Main Search Controller Descriptive Metadata Digital Objects Ingest Query Processor Indexing (gSearch/XTF) Fedora SRU Search Engine Vocabulary Services Web Services Lucene Indexes XSLT stylesheets

Sounds good in theory… Will it work in practice - will it keep us from having to build collection-specific search applications? As it turns out, we didn’t have to wait long for an opportunity to find out. As usual, it started something like:  “By the way, we promised the Lilly Library that we would develop an online demo of the Slocum Puzzle collection to coincide with the grand opening of the Slocum Room”

Process query terms Search and Retrieve Metadata Format Metadata Display Results Main Search Controller Descriptive Metadata Digital Objects Ingest Query Processor Indexing (gSearch/XTF) Fedora SRU Search Engine Vocabulary Services Web Services Lucene Indexes

1. Prepare the metadata Map Jerry Slocum's original Filemaker metadata fields to MODS Export and convert the Filemaker data to MODS records 2. Load the collection into Fedora Acquire preexisting digital images of puzzles Load the MODS metadata and images into Fedora via the ingest system Index the data using the gSearch system Delivering the Slocum Puzzle Collection

3. Develop a prototype for the collection interface Developed by Michelle Dalmau Search application uses generic templates to render pages Header, navigation, footers, etc. are separate pieces included as pages are rendered  Actual interface pages are reusable but still allows strong collection branding 4. Develop XSLT stylesheets to transform the SRU XML response into HTML for the result and detail pages Based on the prototype interface Delivering the Slocum Puzzle Collection

The Jerry Slocum Mechanical Puzzle Collection

Order of events 1. Process search terms 2. Convert into CQL query 3. Invoke the SRU server with URL containing CQL query 4. Receive SRU response containing MODS records 5. Apply XSLT stylesheet to transform MODS records to formatted HTML 6. Assemble results page

Insert banner Insert navigation Insert search controls Insert footer Insert formatted HTML from XSL transformation Retrieve inline images via PURLs

Input terms: japan and wood collection:lilly/slocum CQL query will be: cql.serverChoice="japan" AND cql.serverChoice="wood" AND iudl.collection="lilly/slocum" About to contact SRU server with URL: ery=cql.serverChoice%3D%22japan%22+AND+cql.serverChoice%3 D%22wood%22+AND+iudl.collection%3D%22lilly%2Fslocum%22&st artRecord=1 Our result total is: 45 Application log entries from search

Paging Pass startRecord parameter so that you start at any point you want Application handles calculation, then adds startRecord to SRU parameters Based on a page size of 20:  For page 2, set startRecord=21  For page 3, set startRecord=41

Going Forward… Reusable application for text collections using SRU The eXtensible Text Framework (XTF)  Pioneered for use within the DLP by David Jiao and Tamara Lopez  Architecture that supports searching across collections of heterogeneous textual data, and the presentation of results and documents in a highly configurable manner.  Used to deliver the Minutes of the Board of Trustees of IU and The Chymistry of Isaac Newton collections.

Going Forward… DLP federated image search system  Need for a single interface to search all of DLP image content  The reusable search application easily adapted; it inherently works that way  Need to get all collection metadata into the repository and indexed  Need to create XSLT stylesheets to handle collection- specific display characteristics

Going Forward… Vocabulary and thesaurus services  Current search application does not include features to enhance search and browse via thesaurus relationships  A cornerstone feature of the Charles W. Cushman Photograph collection  Need a service to perform search term expansions  Lookup and include preferred term in search  Lookup and include narrower terms in search  Source of terms should be the same thesauri used for descriptive metadata

Vocabulary Services Web Services Cataloging tools Vocabulary maintenance tools Search and browse applications

Going Forward… Anticipated design of vocabulary and thesaurus service will be:  Vocabularies modeled in a relational database with close Z39.19 compliance  Resulting database indexed by Lucene for close integration to Fedora and SRU  Possibly even use SRU to query terms for expansion, using Zthes  An emerging thesaurus context set for CQL, with companion XML schema for vocabularies and relationships

Going Forward… Adapt existing collection interfaces to use new search system based on SRU  Each collection will present new challenges that will require new features throughout the system Work with OCLC to incorporate our changes to their SRU server  Biggest change was to allow SRU to use Lucene indexes versus their database-specific implementation

Going Forward… Integration with other search services  IUCAT, WorldCat, Google  How do we allow external services to search and/or index us via our SRU server?  Conversely, how do we accept search input from external systems and translate to CQL/SRU queries?  For example, allowing Google search appliance at IU to interface with our search system -- “Gocqle”?

Comments or Questions?

Additional links The SRU standard DLP searching system documentation DLP SRU server documentation OCLC’s SRU implementation