Presentation on theme: "Tying it all Together: Integrating Digital Collections William H. Mischo, Mary C. Schlembach Grainger Engineering."— Presentation transcript:
Tying it all Together: Integrating Digital Collections William H. Mischo, Mary C. Schlembach Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign InfoToday 2003 May 8, 2003
Outline Digital Libraries. Distributed Information Environment. Digital Library Tools. Metadata and Linking. Open Archives Initiative (OAI). Digital Object Identifiers (DOI). Simultaneous Search/Federated Search. Examples. Issues and Roles
The Digital Library ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. Tendency to apply term to collections and resources. Digital Collections vs. Digital Library. Emphasis on the integration of collections and services. Application of standards and ‘best practices’ protocols is important.
Scholarly Communication Overview E-Resources are Web-based and publisher- centric. Growth of Heterogeneous Distributed Repositories. Value-added services and ‘branding’ of journals. Prestige of Specific Journals and Publishers Reciprocal linking relationships between publishers. Cooperation on linking standards (DOI, CrossRef). Alternative publishing models - Academia-based, Preprint Servers.
Distributed Information Environment We live in a world of multiple, heterogeneous information repositories, resources, portals, and IR systems. –OPACs: local, regional, national shared bibliographic databases. –Local and remote A & I Services. –Discrete publisher and vendor repositories (full-text). –Open Preprint Servers (ArX) –Web search engines, vertical portals, custom portals (NSDL, ARL Portal). Surface Web and Hidden Web. –Local metadata, digital objects, GIS, EAD finding aids. –Institutional Repositories (D-Space). –Instructional (course) management systems (WebCT, Blackboard). –Harvestable (OAI-based) sites and services.
Distributed Repository Issues Integration of discrete, heterogeneous information resources. Role of federated and broadcast searching of distributed resources. Integration of collections with reference, instructional and navigation services -TOC, remote reference assistance, Best-Match and Quorum Searching. Integration of Library, institutional, vendor, publisher, and government portals and information services. Standardized Linking technologies. Metadata harvesting, archiving.
Distributed Environment Action Plan Need for document representation, retrieval, transmission, and linking middleware tools and standards. Metadata standards, utilize DOIs, OpenURL. Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries. Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape.
Portal Functions: --Authorization --Linking mechanisms between resources and among resources. --Simultaneous search. --Navigation OPAC A& I Services (Local and Remote) Full-Text Resources Web Client Portal Presentation Level Local Link Server, Local Value-Added Local Databases and OAI Resources via DBMS Linking: --Between full-text using DOI, CrossRef, Appropriate Copy. --Between A&I and full-text. --Between OPAC and full-text. Web Resources & Knowledge Environments E-Resource Registry Aggregator (Ebsco, OCLC) Publisher Portal (Elsevier) CrossRef Metadata DOI Server
Digital Library Tools We have at our disposal the tools to create integrated digital libraries from the distributed digital resources environment in which we operate: –Standard retrieval environment (Web) and interface/client (Web Browser); –Standard transport mechanisms to connect heterogeneous content (HTTP, OAI, SOAP); –Standard metalanguages and tools for describing and transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS); –Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases); –Standard linking tools and infrastructure (DOI, OpenURL, CrossRef). Candidate set of ‘best practices’ for IR.
Metadata and Linking Standards Digital Object Identifier (DOI) and Persistent Object Identifiers. OpenURL and Value-Added Service Components (SFX). Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF. Local Resolver Servers.
Open Archives Initiative (OAI) Released version 2.0 of metadata harvesting protocols. Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories. Roots in e-print archives. Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.
OAI Philosophical Issues Nature Opinion Column: “The Future of the Electronic Scientific Literature.” OAI as mechanism for Scholarly Communication. “...one of many alternatives now being offered to scientists to disseminate their work” –alternate models for SC –role of universities in SC OAI as a standard for Document Representation “…promoting common web standards for digital content” –DC simple –DC qualified (RDF) –other metadata formats -- e.g. MARC, EAD
OAI Philosophical Issues (2) OAI as a Transport Mechanism and Search Mechanism “…metadata standards to facilitate improved searching.” –OAI and Z39.50 –HTTP –SOAP and other XML transport mechanisms Identification and Transport but NOT Search and Discovery –no traditional search capabilities –requires search engine for retrieval Alternatives to OAI –PubMed Central, BioMed Central, E-Biosci –ResearchIndex (NEC); use of natural language full-text, not metadata
Ongoing Investigations Relationship between interoperability models for search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive. OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site. Role of HTTP harvesting, Spider technology. Reference Linking integration built on OpenURL and DOI. Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability.
Portals and Gateways Role is to bring together and integrate disparate e-resources. Provide a systematic ‘view’ of the information landscape, particularly full-text. Two primary foci: robust search/navigation and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text. Central to this implementation is federated and simultaneous search and reference linking technologies.
Digital Object Identifier (DOI) DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier. ‘The ISBN for the 21st Century’ -- Norman Paskin. DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.
Reference Linking CrossRef Publisher system: major Sci- Tech professional societies and commercial publishers. System design calls for one URL for each DOI; underlying technology can handle multiple URLs however. Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem.
Simultaneous Search Implementations DialIndex from Dialog. Ex Libris MetaLib service. Endeavor EnCompass. Innovative Interfaces MetaFind. Ovid Multiple Search and reference De-Duping. ISI Web of Knowledge. Gale Corporation InfoTrac Total Access. WebFeat. California Digital Library SearchLight system. Los Alamos FlashPoint system. Fretwell-Downing partnering with ARL Portal.
Grainger Search Aid Assist users in the selection of appropriate databases. Normalize user search arguments and display search results from candidate databases. Cross-database asynchronous concurrent searching. Article level and e-journal Web site access to publisher full-text repositories. Utilize OpenURL, CrossRef metadata database and DOI for reference linking at the article level. Proxying of vendor systems and capability of ‘taking over’ the search in vendor native mode.
Reference Assistant Project Utilize Search Aid simultaneous search and link capabilities. Opportunity to explore interface and navigation issues. Mimics the behavior of reference librarian. Performs “Best-Match” and Quorum Searching – selects optimum result sets for the user.
OpenURL-Based Services Standard for expressing and transmitting metadata. Promise of standardized, normalized search results. Provides value-added links to the Ovid search results. Using CrossRef metadata database to look up DOIs.
Continuing Issues Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services. Disintermediation may affect both Libraries and Publishers. Information as Function not Place. Provide a ‘Digital Library’ out of digital collections. Role of XML technology. Service mechanisms: processing & archiving, search and discovery, presentation, linking.
Issues (1) Library is now as much Function as it is a place. Library provides many remote access services (online catalogs, A & Is, journals). Better Support for the Library’s Role in the Campus Information Infrastructure. Provide a ‘Digital Library’ out of distributed, unrelated digital collections (DOI links). The typical faculty goes first to personal collection, ask colleague, send , go to Web, Library.
Role of the Library Function of Library has always been: –Collect and Archive source materials; –Organize materials; –Provide access to materials. Modern Library: same, but with new Web access technologies and publishers emphasizing online journals: above activities are now distributed, not confined to a specific place. Integration with Course Management Software (WebCT, Blackboard). Advanced Search and Navigation mechanisms.
Role of the Library (2) Question: How do the support services for these activities need to change? We need to do remote reference. We need integrated access to online journals and Web resources, better search and discovery and navigation. Big Question: How will changes in scholarly communication change the role of the library and the librarian??
4th Generation Information Systems Integration of heterogeneous information resources. Broadcast and Simultaneous Searching of Multiple Resources tailored to user needs. Remote Reference and Instruction (Collaboration software, multimedia, suggestion-based systems). Integration with Learning Management Systems. Software-Aided Search Navigation (Best-Match searching) and Strategy Modification. Dynamic Links to Full-Text. Appropriate Copy problem. One-Stop-Shopping.