Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Search Systems for Digital Library Collections

Similar presentations


Presentation on theme: "Building Search Systems for Digital Library Collections"— Presentation transcript:

1 Building Search Systems for Digital Library Collections
Mark E. Phillips Texas Conference on Digital Libraries May 31, 2007, Austin Texas University of North Texas Libraries

2 University of North Texas Libraries - Digital Initiatives
Library Digital Collections = Digital Objects 3 “Systems” Congressional Research Service Archive 9,500+ CRS Reports Portal to Texas History 20,000+ records – 115,205 files UNT Libraries “Digital Collections” 1,800+ records – 131,481 files Digital Object Types Images = 18,282 Physical Objects = 1,019 Texts = 11,668 Websites = 46 Sound Records = 20 University of North Texas Libraries

3 University of North Texas Libraries
Infrastructure UNT Libraries Digital Library Infrastructure Highly customized installation of IndexData’s Keystone Digital Library System OAIS based system Digital objects housed as xml files on filesystem One xml file per digital object Supports simple, complex and link records Custom workflow for batch ingest Manages web presentable files and descriptive and preservation metadata Digital masters stored in separate system University of North Texas Libraries

4 University of North Texas Libraries
Search 1.0 Keystone supplied search Zebra retrieval engine 1 index per “system” Highly customizable search system Vendor supplied search interface and functionality University of North Texas Libraries

5 University of North Texas Libraries
Search Issues Difficult configuration Issues with large xml file retrieval (10MB+ xml files) Search grammar not functioning correctly Relevance ranking was “magic” No custom searching Only searching at the digital object level University of North Texas Libraries

6 University of North Texas Libraries
Search 1.5 MySQL database for page level searching In Document Searching (IDS) Two levels of granularity (Zebra=object and MySQL=page) Easy customization More documentation on relevance ranking Logical search grammars University of North Texas Libraries

7 University of North Texas Libraries
Search 1.5 – Issues Different search grammars Zebra vs. MySQL fulltext Scaling issues Search Performance System Resources University of North Texas Libraries

8 Search System Criteria
Customizable relevance ranking Sorting Simple search syntax Fielded Searching Term Modifiers Wildcard Searches Fuzzy Searches Proximity Searches Range Searches Boolean Operators Grouping Caching Implemented as a web-service University of North Texas Libraries

9 University of North Texas Libraries
Search 2.0 Solr is an open source enterprise search server based on the Lucene Java search library. XML/HTTP based Hit highlighting Faceted search Caching Replication Web administration interface. University of North Texas Libraries

10 University of North Texas Libraries
Current Architecture Solr Solr Digital Object Index Page Index Query Digital Collections Server Spelling Suggestions Results Page University of North Texas Libraries

11 University of North Texas Libraries

12 University of North Texas Libraries

13 University of North Texas Libraries

14 University of North Texas Libraries

15 University of North Texas Libraries

16 Customizable Relevance
Combine Full-text AND descriptive metadata Positive Boost to Title – (+20) Positive Boost to Subject – (+15) Positive Boost to Creator – (+14) Positive Boost to Metadata overall – (+5) Full-text = Neutral boost University of North Texas Libraries

17 University of North Texas Libraries
Better results Helps to overcome IDF’s effect on results Results order more logically Takes advantage of both metadata and full-text User defined relevance ranking? University of North Texas Libraries

18 University of North Texas Libraries
Questions? University of North Texas Libraries


Download ppt "Building Search Systems for Digital Library Collections"

Similar presentations


Ads by Google