Presentation is loading. Please wait.

Presentation is loading. Please wait.

Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011.

Similar presentations


Presentation on theme: "Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011."— Presentation transcript:

1 Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011

2 Outline Introduction and motivation – Jon Demo – Jon Technical implementation – Hui Next steps and future work – Jon

3 Why cross-collection search? Support discovery across multiple content formats, collections, and repositories at IU Use cases: ◦ Multiple formats/collections within a single thematic grouping (e.g. Hoagy Carmichael)Hoagy Carmichael ◦ Show off the richness and diversity of IU’s digital collections (PR – see open.iu.edu)open.iu.edu ◦ Find digital content at IU for teaching or research use

4 Why cross-collection search? Support discovery across multiple content formats, collections, and repositories at IU Use cases: ◦ Multiple formats/collections within a single thematic grouping (e.g. Hoagy Carmichael)Hoagy Carmichael ◦ Show off the richness and diversity of IU’s digital collections (PR – see open.iu.edu)open.iu.edu ◦ Find digital content at IU for teaching or research use

5 Digital collections evolution: Discrete collection web sites

6 Digital collections evolution: Services METS Navigator Archives Online PhotoCat Video Streaming Service Variations

7 Digital collections evolution: Services Advantages ◦ Can develop workflows for content ingestion and description that are both optimized and scalable ◦ Content stored in a common repository (Fedora) ◦ Can develop discovery interfaces optimized for particular content (e.g. images vs. music) ◦ Common services to expose content into other platforms (e.g. Google) Disadvantages ◦ “Siloing” discovery by content type can be an issue

8 Cross-collection search: First iteration Only selected collections with metadata in Fedora ◦ Includes Archives Online and most image collections ◦ Not video streaming, Variations, encoded text, IUScholarWorks, various “legacy” collections Metadata only (MODS) ◦ Stored natively as MODS in Fedora ◦ Disseminated on the fly from other formats (PhotoCat2) ◦ Transformed via XSLT from EAD (Archives Online)

9 Cross-collection search: First iteration Demonstration

10 Challenge: Item-level records from EAD

11 Apache Solr Overview A Java-based web application, open source search server, Apache Lucene at its core Demonstration Solr vs. relational database Pros: full-text search, text analysis, flexible fields Cons: no relational operation on fields Solr vs. Lucene Pros: web application, centralized configuration, facet Cons: security, slower

12 Solr Schema and Configuration Schema: specify how the index is built ◦ field, field type ◦ dynamicField, copyField, uniqueKey ◦ Text analysis: stop, stem, synonym, tokenization Configuration: specify Solr itself, query, data import

13 Converting MODS to Solr XML Solr XML ◦ … … ◦ Can simply be “POST” into the Solr index Translation of MODS to Solr XML ◦ Use XSLT ◦ Called by the indexing program Extract facet values ◦ Format: MODS:typeofResource ◦ Collection: customized based on item’s Fedora PID

14 iudl:10000 Women Medical Students Photographic Services, Photographer Photographic Services Medical students Bloomington Indiana still image Photographs 04-13-1956 1956 P0028020 /archives/photos/ …

15 Solr Indexing Carried by two Java programs running under DLP’s Fedora Index Service framework The service can be invoked by a RESTful HTTP request, the Solr indexing is triggered based on conditions specified in the properties file The MODS records are extracted from the Fedora repository (natively stored) or generated by the getMODS disseminator (Photocat2 collections)

16 Overview of Blacklight An open source project developed for libraries with many potentials: ◦ As a library catalog ◦ As the discovery interface to a digital repository Optimized to handle diversified content (facet browsing) Originally developed by University of Virginia, has a growing community of active contributors and users Now part of Hydra Project Written in Ruby, runs on Rails, requires Solr

17 Customize Blacklight for DLP Collections Integrate blacklight with MODS-based index ◦ Blacklight by default expects MARC fields New functions and features ◦ Render thumbnail in result view ◦ Use collection website as the landing page Style and layout ◦ Standard IU banner and footer ◦ Color, font, and window size

18 Future Improvements Automatic update of Solr index ◦ Fedora repository communicates with the Solr indexing program via JMS about item update Include full-text content ◦ It is challenging to have full-text content and metadata in one index ◦ Optimize the indexing and search algorithms ◦ Search against full-text and use metadata as facets

19 Future Improvement (cont’d) Add more collections ◦ Other collections from Fedora ◦ Non-Fedora DLP collections ◦ Archives of Institutional Memory ◦ IUScholarWorks Repository? ◦ IUPUI Digital Collections (ContentDM)? Conduct usability evaluation Explore integration w/ new Blacklight-based discovery layer for IUCAT Variations on Video IMLS grant ◦ Hydra/Blacklight-based discovery on PBcore

20 Questions? Beta: http://webapp1.dlib.indiana.edu/dcs/ http://webapp1.dlib.indiana.edu/dcs/ Send comments to: diglib@indiana.edu diglib@indiana.edu


Download ppt "Overview of IU Digital Collections Search Hui Zhang Jon Dunn Indiana University Digital Library Program IU Digital Library Brown Bag October 19, 2011."

Similar presentations


Ads by Google