Presentation on theme: "DEVELOPMENTS AND TRENDS IN THE LMS AND DISCOVERY ARENAS Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library."— Presentation transcript:
DEVELOPMENTS AND TRENDS IN THE LMS AND DISCOVERY ARENAS Marshall Breeding Director for Innovative Technology and Research Vanderbilt University Library Founder and Publisher, Library Technology Guides http://www.librarytechnology.org/ http://twitter.com/mbreeding Program on National Infrastructure 26 August 2010 Stockholm
Seminar Goal The aim of the seminar is to create an understanding of the infrastructural challenges and to contribute to a plan of action for the future. Library Directors and System managers will discuss different solutions of availability and management of e- resources in order to make strategic choices for the development of the infrastructure at a national level.
Presentation Themes Trends and recent developments in the library system market, resource discovery services and resource management as indexing/knowledge bases Creation and management of data wells for metadata Ongoing discussion regarding options for building data wells in-house, open source or partnering with commercial actors.
Summary development and trends in the library system market, regarding resource discovery services and resource management as indexing/knowledge bases. If I should emphasize something special, it is the question of data wells for metadata. We have been investigating the data well question in a report (plesase see below, Summary in English) and there is a discussion about building data wells in-house, open source or with commercial actors. We have also invited three commercial actors to the seminar. Not an easy question! Related is also the topic of the national catalogue LIBRIS as a local OPAC for the libraries. How can Libris work as, not only the national catalogue, but also as a local OPAC? The third topic is the future for ExLibris, Metalib/SFX in Sweden. We´re happy with SFX, but not with Metalib/federated search, how to continue? But the main focus at the seminar will be resource management/data well, although Libris and Metalib/SFX questions need to be included in the discussions.
Disjointed approach to information and service delivery Silos Prevail Books: Library OPAC (ILS module) Articles: Aggregated content products, e-journal collections OpenURL linking services E-journal finding aids (Often managed by link resolver) Local digital collections ETDs, photos, rich media collections Metasearch engines All searched separately
Lack of unified Web presence User’s don’t understand the distinctions we make Catalog? Articles and Databases? Digital Library? Search our Site? Search interfaces based on content formats or management applications Non-library Web sites are much more unified
A simple vision A single point of entry to all the content and services offered by the library …but with precision, nuanced sophistication, and multiple dimensions Search:
Online Catalog vs. Discovery Layer Online Catalog Interface conventions from an earlier Web era Scope: Tied to the ILS and its content domain Discovery Layer Modern interface elements Scope: aims to address broad range of components that constitute library collections
Social discovery Tags, user-supplied ratings and reviews Leverage social networking interactions to assist readers in identifying interesting materials: BiblioCommons Leverage use data for a recommendation service of scholarly content based on link resolver data: Ex Libris bX service
Deep indexing Metadata can no longer serve as the only basis for discovery Increasing opportunities to search the full contents Google Library Print, Google Publisher, Open Content Alliance, government publications, etc. High-quality metadata will improve search precision Commercial search providers already offer “search inside the book” and searching across the full text of large book collections Important transition to full-text book search beginning in library projects HathiTrust indexing 6 million volumes Must become a routine component of library discovery Deep search highly improved by high-quality metadata
Discovery product Trend Initial products focused on technology AquaBrowser, Endeca, Primo, Encore, VUfind Mostly locally-installed software Current phase focused on integrated access to both local content and remote articles to deliver Web-scale discovery. Examples: Summon (Serials Solutions) WorldCat Local (OCLC) EBSCO Discovery Service (EBSCO) Primo Central Encore Synergy
Beyond Federated search Federated Search / Metasearch use real-time queries against multiple information targets No centralized index – presentation of dynamic results Shallow results -- only a few results initially fetched from each target Difficult to calculate relevancy Performance challenges
Beyond local discovery interfaces Pre-populated indexes Web-scale Exploits the full depth and breadth of library collections Beyond the bounds of the local library’s collection Targets the universe of objective, vetted library content
Pre-populated discovery services New-generation interface Harvested local content ILS metadata Institutional repositories, ETDs, Digital Collection platforms Vendor-supplied indexes of library content E-journals, databases, e-books Full-text and metadata corresponding to e-content subscriptions Book collections beyond local library collections Includes full-text indexing to the fullest extent possible
Federated Search Search: Digital Collections ProQuest EBSCOhost … MLA Bibliography ABC-CLIO Search Results Real-time query and responses ILS Data
Discovery Interface Search: Digital Collections ProQuest EBSCOhost … MLA Bibliography ABC-CLIO Search Results Real-time query and responses ILS Data Local Index MetaSearch Engine
Web-scale Search Search: Digital Collections ProQuest EBSCOhost … MLA Bibliography ABC-CLIO Search Results Pre-built harvesting and indexing Consolidated Index ILS Data
Web-scale Search + Federated Search Search: Digital Collections ProQuest … MLA Bibliography ABC-CLIO Search Results Pre-built harvesting and indexing Consolidated Index ILS Data Fed Search Non- harvestable Resources Non- harvestable Resources
Discovery Delivery Discovered content delivered through original repositories Publisher agreements generally preclude exposing content for direct access Should necessarily circumvent core role of publisher
Benefits Libraries: increased access to high-cost electronic content Users: Easer access to research resources Publishers: Increased impact of content products IT perspective: advance harvesting makes more efficient use of resources than simultaneous real- time queries
Toward a Large-scale National Discovery environment
Obstacles and Challenges Scaleable technology platform Acceptable relevancy-based retrieval for large heterogeneous collections Acquisition of data and metadata for aggregated index
Opportunities Climate more favorable to harvesting e-content for indexing Highly scaleable, open source tools for discovery infrastructure Lucene SOLR Many ongoing synergistic projects as possible collaborative partners
Potential Commercial Partners Three commercial organizations will participate in the seminar: Ex Libris Serials Solutions EBSCO Each has negotiated access to commercial content products Paved the way for library driven projects
Summa State and University Library of Denmark Locally built integrated search Catalogs + articles Failed to receive EU funding due to lack of guarantees to receive article data from publishers Now Partnering with Serials Solution to use article index from Summon via API
Trove National Library of Australia Previously called Single Business Discovery Project Brings together many previously separate discovery systems Built in-house at NLA Prototype released May 2009 Includes some full-text as well as metadata Technology: Java, Lucene, SOLR, MySQL Details: http://www.nla.gov.au/pub/gateways/issues/101/stor y01.html http://www.nla.gov.au/pub/gateways/issues/101/stor y01.html
What about OCLC? WorldCat: ever expanding repository of metadata Books mostly, increasing article metadata Focused on expanding WorldCat for broad discovery ArticleFirst 23 million records April 2009 agreement with EBSCO for article metadata (withdrawn?). Quantity of article metadata apparently not on track to attain the same level of comprehensiveness as seen in Summon, EDS, Primo Central
Developing the Data Well / Aggregated index Aggregation of metadata and content Normalization – map metadata to make indexing, facets, and presentation meaningful De-duplication of records within and between content sources FRBR – Collapsible groupings according to FRBR concepts: work – expression -- manifestation – item
Content sources populating the Aggregated Index Article metadata and full text Index views according to profile Coordinated with local OpenURL knowledge bases Digital Collections LMS Metadata Books, Microfilm, periodical titles, DVD, etc Blending of vendor provided metadata and locally managed unique content At the cusp of being able to represent library collections comprehensively
Acquiring content for Aggregated Index Agreements with publishers and providers of article content to libraries Open access content Any OAI target Local digital collections Relevant library catalog data OK with OCLC record use policies when aggregated at a national level?
Data Well Construction Technical Assembling technologies of adequate scale and capacity Indexing, Search and retrieval Normalizing Business / Political Agreements with commercial publisher to provide metadata or content Increasing expectation from libraries to allow harvesting for discovery (Similar to COUNTER compliance, OpenURL support) Improved performance at delivering library end users to publisher content
Relationship with OpenURL Knowledgebase The aggregation of article-level citations and content relates to journal title-level profile and availability data in the OpenURL knowledgebase Important source of profiling needed to deliver appropriate views of the index for different libraries.
A labor-intensive project Business process Develop relationships with providers and publishers Construct contracts and licenses Technical Create import process for each source: Normalization, Mapping, de-duplication, FRBR groupings Initial load + constant incremental updates Creation of highly scalable indexing and retrieval platform Must scale up to 1 billion articles Develop algorithms and tunings for appropriate relevancy rankings Interface design
Building Expectations for Article Discovery Libraries should require agreements for harvesting as part of content licensing process Library licenses have led to broad support for: COUNTER SUSHI OpenURL Linking
Beyond Metadata Increasing expectation for full-text indexing Capacity present in e-journals for many years Full-text book indexing more problematic Much full text not available Complex to index
Heterogeneous index Books – mere millions Articles – many hundreds of millions Digital objects – many hundreds of millions
How to deal with non-harvestable resources Metasearch? Resource recommendation service Database spotlighting
Positioning of Discovery vs native Interfaces Current generation of discovery interfaces lack important features Service delivery (items borrowed, renewals, fee payments, etc) Browse and other advanced search or retrieval features Many libraries use native Web-based catalog to supplement Native interfaces of major information products appeal to discipline specialists
Content + Services Must go beyond discovery to fulfillment Further integration of user services features into discovery interface Increased resource sharing capabilities
LIBRIS National Union Catalog > Local catalog? Local LMS?
Service-oriented architecture Key technology for interoperability among diverse software applications New applications built with SOA throughout Legacy applications with a services layer
Aggregating data and metadata Open source Commercial partnerships
Mobile access to library content and services New opportunity to retain and attract library users Mobile web and apps Working toward a unified Mobile library presence Unify disjointed mobile silos the same ambitions as we have for our the Web