Presentation is loading. Please wait.

Presentation is loading. Please wait.

University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May.

Similar presentations


Presentation on theme: "University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May."— Presentation transcript:

1 University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May 31, 2007, Austin Texas

2 University of North Texas Libraries University of North Texas Libraries - Digital Initiatives Library Digital Collections = 31000+ Digital Objects 3 “Systems” –Congressional Research Service Archive 9,500+ CRS Reports –Portal to Texas History 20,000+ records – 115,205 files –UNT Libraries “Digital Collections” 1,800+ records – 131,481 files Digital Object Types –Images = 18,282 –Physical Objects = 1,019 –Texts = 11,668 –Websites = 46 –Sound Records = 20

3 University of North Texas Libraries Infrastructure UNT Libraries Digital Library Infrastructure –Highly customized installation of IndexData’s Keystone Digital Library System –OAIS based system –Digital objects housed as xml files on filesystem –One xml file per digital object –Supports simple, complex and link records –Custom workflow for batch ingest –Manages web presentable files and descriptive and preservation metadata –Digital masters stored in separate system

4 University of North Texas Libraries Search 1.0 Keystone supplied search –Zebra retrieval engine –1 index per “system” –Highly customizable search system –Vendor supplied search interface and functionality

5 University of North Texas Libraries Search 1.0 - Issues Difficult configuration Issues with large xml file retrieval (10MB+ xml files) Search grammar not functioning correctly Relevance ranking was “magic” No custom searching Only searching at the digital object level

6 University of North Texas Libraries Search 1.5 MySQL database for page level searching –In Document Searching (IDS) –Two levels of granularity (Zebra=object and MySQL=page) –Easy customization –More documentation on relevance ranking –Logical search grammars

7 University of North Texas Libraries Search 1.5 – Issues Different search grammars Zebra vs. MySQL fulltext Scaling issues Search Performance System Resources

8 University of North Texas Libraries Search System Criteria Customizable relevance ranking Sorting Simple search syntax –Fielded Searching –Term Modifiers Wildcard Searches Fuzzy Searches Proximity Searches Range Searches –Boolean Operators –Grouping Caching Implemented as a web-service

9 University of North Texas Libraries Search 2.0 Solr is an open source enterprise search server based on the Lucene Java search library. XML/HTTP based Hit highlighting Faceted search Caching Replication Web administration interface.

10 University of North Texas Libraries Current Architecture Query Digital Collections Server Digital Object Index Page Index Solr Spelling Suggestions Results Page

11 University of North Texas Libraries

12

13

14

15

16 Customizable Relevance Combine Full-text AND descriptive metadata –Positive Boost to Title – (+20) –Positive Boost to Subject – (+15) –Positive Boost to Creator – (+14) –Positive Boost to Metadata overall – (+5) –Full-text = Neutral boost

17 University of North Texas Libraries Better results Helps to overcome IDF’s effect on results Results order more logically Takes advantage of both metadata and full-text User defined relevance ranking?

18 University of North Texas Libraries Questions?


Download ppt "University of North Texas Libraries Building Search Systems for Digital Library Collections Mark E. Phillips Texas Conference on Digital Libraries May."

Similar presentations


Ads by Google