Building Search Systems for Digital Library Collections

Slides:



Advertisements
Similar presentations
Metadata Quality Assurance : The University of North Texas Libraries Experience Daniel Gelaw Alemneh & Hannah Tarver 3rd annual Texas Conference on Digital.
Advertisements

Theo van Veen, Koninklijke Bibliotheek The European Library: opportunities for new services.
Possibility in Digital Collection Management Introduction to CONTENTdm TM Hitoshi Kamada University of Arizona Presentation for OCLC-CJK Users Group Annual.
© Copyright 2012 STI INNSBRUCK Apache Lucene Ioan Toma based on slides from Aaron Bannert
1 panFMP - Ein XML-basiertes Framework für Metadaten- Portale Vortrag und „hands-on“ Seminar am GFZ Potsdam Uwe Schindler MARUM – Universität Bremen PANGAEA.
Information Retrieval in Practice
R utgers C ommunity R epository RU CORE Fedora Repository Object Datastreams.
Automated Reference Assistance: Reference for a New Generation Denise Troll Covey Associate University Librarian Carnegie Mellon CNI Meeting – April 2002.
River Campus Libraries Metadata That Supports Real User Needs David Lindahl Director of Digital Library Initiatives University of Rochester Libraries.
Tools and Services for the Long Term Preservation and Access of Digital Archives Joseph JaJa, Mike Smorul, and Sangchul Song Institute for Advanced Computer.
River Campus Libraries Metadata That Supports Real User Needs Jennifer Bowen Head of Cataloging University of Rochester Libraries David Lindahl Director.
Eric Sieverts University Library Utrecht IT Department Institute for Media & Information Management (Hogeschool van Amsterdam)
River Campus Libraries Find Articles A Web Redesign for ENCompass David Lindahl Web Initiatives Manager River Campus Libraries University of Rochester.
Overview of Search Engines
Making sense of the data jumble Trinity College Library Dublin’s Discovery Solution Experience Arlene Healy & Charles Montague Digital Systems and Services.
Microsoft Office System UK Developers Conference Radisson Edwardian, Heathrow 29 th & 30 th June 2005.
Implementing search with free software An introduction to Solr By Mick England.
Create and Manage METS in retrodigitization Markus Enders Goettingen State and University Library
Word Up! Using Lucene for full-text search of your data set.
An introduction to databases In this module, you will learn: What exactly a database is How a database differs from an internet search engine How to find.
Databases & Data Warehouses Chapter 3 Database Processing.
Putting it all together for Digital Assets Jon Morley Beck Locey.
DEF System Architecture XML Web Services Fedora and the Zebra Search Engine in an OAI Eprints Application by Gert Schmeltz Pedersen, DTV
A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock
Web based METS creation Ralf Stockmann case study.
1 XML as a preservation strategy Experiences with the DiVA document format Eva Müller, Uwe Klosa Electronic Publishing Centre Uppsala University Library,
Hotbot A Search Engine Case Study. Introduction  Owned by Terra/Lycos.  One of the largest web search engines.  Uses the Inktomi database combined.
7. Approaches to Models of Metadata Creation, Storage and Retrieval Metadata Standards and Applications.
University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.
Revolutionizing enterprise web development Searching with Solr.
Kelly Boccia Abi Natarajan Konstantin Livitski Senthil Anand Subbanan Meyyappan 1.
Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.
Archivists' Toolkit - CRADLE Presentation, 10 Feb The Archivists’ Toolkit CRADLE Presentation 10 Feb
Design of a Search Engine for Metadata Search Based on Metalogy Ing-Xiang Chen, Che-Min Chen,and Cheng-Zen Yang Dept. of Computer Engineering and Science.
Archivists’ Toolkit: Introduction March 12, 2007 Jody Lloyd Thompson.
Introduction to Omeka. What is Omeka? - An Open Source web publishing platform - Used by libraries, archives, museums, and scholars through a set of commonly.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
WDC-MARE – World Data Center for Marine Environmental Sciences Data portal based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler,
Discovery Metadata for Special Collections Concepts, Considerations, Choices William E. Moen School of Library and Information Sciences Texas Center for.
IUScholarWorks Technical Overview Randall Floyd Digital Library Program Programmer/Database Administrator.
Uwe SchindlerGES 2007 – May 2-4, 2007 Data Information Service based on Open Archives Initiative Protocols and Apache Lucene Uwe Schindler 1, Benny Bräuer.
Iccha Sethi Serdar Aslan Team 1 Virginia Tech Information Storage and Retrieval CS 5604 Instructor: Dr. Edward Fox 10/11/2010.
Searching CiteSeer Metadata Using Nutch Larry Reeve INFO624 – Information Retrieval Dr. Lin – Winter 2005.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
OAIS Rathachai Chawuthai Information Management CSIM / AIT Issued document 1.0.
Mercury – A Service Oriented Web-based system for finding and retrieving Biogeochemical, Ecological and other land- based data National Aeronautics and.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
Leveraging Publisher’s Search Engines to Deliver Relevant Results to Users Presented by Abe Lederman, President and CTO Deep Web Technologies, LLC 28 th.
DSpace System Architecture 11 July 2002 DSpace System Architecture.
VuFind Digital Libraries à la Carte International Ticer School 2009 Tilburg University 31 July, 2009 Benoit PAUWELS Université Libre de Bruxelles (ULB)
ROLLING YOUR OWN DIGITAL LIBRARY SYSTEM University of North Texas Libraries.
O PEN A CCESS TO O UR H ERITAGE The Gateway to Oklahoma History Cross Timbers Library Conference – August 16, 2013 Sarah Lynn Fisher University of North.
1 CS 430: Information Discovery Lecture 26 Architecture of Information Retrieval Systems 1.
VIVO architecture March 1, Major Components Vitro is a general-purpose Web-based application leveraging semantic standards VIVO is a customized.
Interaction classes Record context Custom lookups.
Apache Solr Dima Ionut Daniel. Contents What is Apache Solr? Architecture Features Core Solr Concepts Configuration Conclusions Bibliography.
University of North Texas Federated Search Mark E. Phillips August 24, 2006.
Breeda Herlihy, IR Manager, UCC Library. UCC selected DSpace in 2008 Software selection group Staff from Library IT, Computer Centre, Special Collections,
Discovery and Metadata March 9, 2004 John Weatherley
Search can be Your Best Friend You just Need to Know How to Talk to it IW 306 Ágnes Molnár.
Digitization Workflows From the Digital Projects Unit University of North Texas Libraries Mark E. Phillips Jeremy D. Moore February 12, 2009.
Information Retrieval in Practice
Metadata and XML <xmlpresentation>
Rolling your own Digital Library System
Alison Valk Georgia Tech
NASA Technical Report Server (NTRS) Project Overview April 2, 2003
Eric Sieverts University Library Utrecht Institute for Media &
Metadata to fit your needs... How much is too much?
Preserving Our Collective Digital History
Information Retrieval and Web Design
Presentation transcript:

Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May 31, 2007, Austin Texas University of North Texas Libraries

University of North Texas Libraries - Digital Initiatives Library Digital Collections = 31000+ Digital Objects 3 “Systems” Congressional Research Service Archive 9,500+ CRS Reports Portal to Texas History 20,000+ records – 115,205 files UNT Libraries “Digital Collections” 1,800+ records – 131,481 files Digital Object Types Images = 18,282 Physical Objects = 1,019 Texts = 11,668 Websites = 46 Sound Records = 20 University of North Texas Libraries

University of North Texas Libraries Infrastructure UNT Libraries Digital Library Infrastructure Highly customized installation of IndexData’s Keystone Digital Library System OAIS based system Digital objects housed as xml files on filesystem One xml file per digital object Supports simple, complex and link records Custom workflow for batch ingest Manages web presentable files and descriptive and preservation metadata Digital masters stored in separate system University of North Texas Libraries

University of North Texas Libraries Search 1.0 Keystone supplied search Zebra retrieval engine 1 index per “system” Highly customizable search system Vendor supplied search interface and functionality University of North Texas Libraries

University of North Texas Libraries Search 1.0 - Issues Difficult configuration Issues with large xml file retrieval (10MB+ xml files) Search grammar not functioning correctly Relevance ranking was “magic” No custom searching Only searching at the digital object level University of North Texas Libraries

University of North Texas Libraries Search 1.5 MySQL database for page level searching In Document Searching (IDS) Two levels of granularity (Zebra=object and MySQL=page) Easy customization More documentation on relevance ranking Logical search grammars University of North Texas Libraries

University of North Texas Libraries Search 1.5 – Issues Different search grammars Zebra vs. MySQL fulltext Scaling issues Search Performance System Resources University of North Texas Libraries

Search System Criteria Customizable relevance ranking Sorting Simple search syntax Fielded Searching Term Modifiers Wildcard Searches Fuzzy Searches Proximity Searches Range Searches Boolean Operators Grouping Caching Implemented as a web-service University of North Texas Libraries

University of North Texas Libraries Search 2.0 Solr is an open source enterprise search server based on the Lucene Java search library. XML/HTTP based Hit highlighting Faceted search Caching Replication Web administration interface. University of North Texas Libraries

University of North Texas Libraries Current Architecture Solr Solr Digital Object Index Page Index Query Digital Collections Server Spelling Suggestions Results Page University of North Texas Libraries

University of North Texas Libraries

University of North Texas Libraries

University of North Texas Libraries

University of North Texas Libraries

University of North Texas Libraries

Customizable Relevance Combine Full-text AND descriptive metadata Positive Boost to Title – (+20) Positive Boost to Subject – (+15) Positive Boost to Creator – (+14) Positive Boost to Metadata overall – (+5) Full-text = Neutral boost University of North Texas Libraries

University of North Texas Libraries Better results Helps to overcome IDF’s effect on results Results order more logically Takes advantage of both metadata and full-text User defined relevance ranking? University of North Texas Libraries

University of North Texas Libraries Questions? University of North Texas Libraries