Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library.

Similar presentations


Presentation on theme: "Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library."— Presentation transcript:

1 Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo w-mischo@uiuc.edu Grainger Engineering Library Information Center University of Illinois at Urbana-Champaign 2002 International Conference on Digital Archive Technologies (ICDAT2002) December 19, 2002

2 Outline Digital Libraries and the Distributed Information Environment. Document Representation and Full-Text Digital Library Tools Illinois Projects. XML Technologies. Metadata Technologies. DOIs, Linking, Local Resolver Portals, Simultaneous Search, Linking Grainger Search Aid Issues & Trends.

3 The Digital Library ‘Digital’, ‘Virtual’, ‘Electronic’ Library as network-based library without regard to place and time. Tendency to apply term to collections and resources. Digital Collections vs. Digital Library. Emphasis on the integration of collections and services (e.g. NSDL grant). Application of standards and protocols is important.

4 Scholarly Communication Overview E-Resources are Web-based and publisher-centric. Growth of Heterogeneous Distributed Repositories. Value-added services and ‘branding’ of journals. Prestige of Journals and Publishers Reciprocal linking relationships between publishers. Cooperation on linking standards (DOI, CrossRef). Alternative publishing models - Academia, Preprint Servers, disintermediation.

5 Distributed Information Environment We live in a world of multiple, heterogeneous information repositories, resources, portals, and IR systems. –OPACs – local, regional, national shared bibliographic databases. –Local and remote A & I Services. –Discrete publisher and vendor repositories (full-text). –Web search engines, vertical portals, custom portals (NSDL, ARL Portal). –Local metadata, digital objects, GIS, finding aids. –Preprint servers and institutional repositories (D-Space). –Instructional (course) management systems (WebCT, Blackboard). –Harvestable (OAI) sites and services.

6

7 Distributed Repository - Issues Integration of discrete, heterogeneous information resources. Role of federated and broadcast searching of distributed resources. Integration of collections with reference, instructional and navigation services -TOC, remote reference assistance. Integration of Library, institutional, vendor, publisher, and government portals and information services. Linking technologies. Metadata harvesting, archiving.

8 Distributed Environment Action Plan Pressing need for document representation, retrieval, transmission, and linking middleware tools and standards. Metadata standards, DOIs, OpenURL. Factor: changing landscape of Scholarly Communication and disintermediation of publishers and libraries. Federated search and simultaneous search with reference linking as mechanism to integrate DL landscape.

9 Portal Functions: --Authorization --Linking mechanisms between resources and among resources. --Simultaneous search. --Navigation OPAC A& I Services (Local and Remote) Full-Text Resources Web Client Portal Presentation Level Local Link Server, Local Value-Added Local Databases and OAI Resources via DBMS Linking: --Between full-text using DOI, CrossRef, Appropriate Copy. --Between A&I and full-text. --Between OPAC and full-text. Web Resources & Knowledge Environments E-Resource Registry Aggregator (Ebsco, OCLC) Publisher Portal (Elsevier) CrossRef Metadata DOI Server

10 Document Representation Continuum of Web-Enabled technologies -- all presently being utilized. Evolving technologies and standards. Role and history of markup. XML: its role and importance. The Smart Document.

11

12 Digital Library Tools We have at our disposal the tools to create integrated digital libraries from the distributed digital resources environment in which we operate: –Standard retrieval environment (Web) and interface/client (Web Browser); –Standard transport mechanisms to connect heterogeneous content (HTTP, OAI, SOAP); –Standard metalanguages and tools for describing and transforming content and metadata (XML, DTDs & Schemas, XSLT, DC/DCQ, RDF, METS); –Standardized search/retrieval mechanisms (HTTP Post/Get, SQL, Z39.50, Object Oriented Databases); –Standard linking tools and infrastructure (DOI, OpenURL, CrossRef). Candidate set of ‘best practices’ for IR.

13 Work by Illinois DLI Group We are attempting to address many of these issues within the Digital Library Initiatives group. Headquartered at Grainger Engineering Library Information Center at UIUC. Grant Work: –Digital Library Initiative I (NSF, others), 1994-1998. –Corporation for National Research Initiatives (CNRI) D-Lib Test Suite, 1998-2001. –Collaborating Partners Program, 1998--. –Andrew Mellon Foundation OAI Harvesting grant, 2001-2002. –NSF NSDL (National Science, Engineering, Technology, and Mathematics Digital Library) Program, 2002-2004. –Institute of Museum and Library Services (IMLS) Registry and Integration grant, 2002-2005.

14 Illinois Testbed Project Funded under DLI-I by NSF, DARPA, and NASA, 1994--1998. Awards made to 6 universities. Large-scale Testbed, Distributed Repository models, evaluation, Web software. Funded under CNRI D-Lib Test Suite Program, 1998—2001. Collaborating Partners Program. AIP, APS, ASCE, IEE, NRL, ASM, ACM, NTT Learning Systems, Elsevier. All XML Journal -- AIP, APS, ACM.

15 Illinois Full-Text Testbed American Institute of Physics--APL, JAP, RSI –19,000+ articles, 1995--. American Physical Society--PRL –15,000+ articles, 1995--, weekly updates. ASCE Journals (25 titles) –11,000+ articles, 1995--. IEE Proceedings and Electronics Letters –9,500+ articles, 1993--. IEEE Computer Society. ASM (American Society for Materials) Handbook. ACM (Association for Computing Machinery) Transactions. Elsevier Science.

16 Accomplishments Process & retrieve from multiple publishers & heterogeneous DTDs. SGML to XML Conversion. Development of a metadata specification that uses RDF, Dublin Core (DCQ and XML) XML Schemas, local Namespace. Cross-repository searching (Testbed & D-LIB Test Suite). Full-Text and Metadata. XSLT, CSS, for transformation & rendering, including Mathematics.

17 Accomplishments (2) Introduction of numerous technologies now deployed within publisher repositories: –Forward and Backward links in bibliographies -- within Testbed/Repository, from/to A & I Services. –Use of XSLT for transforming XML to HTML. –Rich extended abstracts. Conversion of ISO 12083 math markup to MathML. CSS/DHTML mathematics rendering. Use of plug-ins. Enhanced Web retrieval mechanisms: Author Word Wheels, Co-Occurrence Matrices. Local Link Server for DOIs, Context-Sensitive linking.

18 XML (eXtensible Markup Language) Like SGML, a Data Description Metalanguage. XML a subset/version of SGML. Document representation and interchange Standard. Allows fine-granularity markup of content and structure. Author can create their own elements (extensible). Tags define the structure of document not the presentation format. Validated vs. “well-formed” - separation of authoring process from representation & presentation. Either validated in DTD/Schema or well-formed. Integrated with relational DBs.

19 XML Features The milestones in document description and transmission: ASCII, TCP/IP, HTTP and HTML, XML. Web Programmability. DTD not required with XML. Needed if internal entities. Use of Document Object Model (DOM). Technology approach from Web developer’s standpoint: XML data, CSS presentation layer, XSLT to transform the structure (‘view’) of the data/document.

20 XML in Information Technologies Used in Open Archives Initiative (OAI), NSDL. Compatible with MS SQL Server, Tamino (Software AG), Oracle, DLXS/XPAT (University of Michigan/OpenText), others. Integral to Web Services (WSDL) and SOAP – Google Web Service. Used in Library of Congress MODS and METS metadata technologies. Baked into XyVision and publishing packages.

21 XML, XSLT, and CSS Use XML full-text articles as ordered hierarchy of content objects. Generate item-level metadata in XML, using RDF and Dublin Core syntax and semantics. XSLT and CSS used to present metadata and articles in either XML or HTML format depending on Browser. Mathematics rendering using MathML tools (conversion from ISO 12083 to MathML). Real-time transformation between XML and HTML using XSLT.

22 Schemas vs. DTDs Both are systems of representing a data model that defines the data’s elements and attributes, and the relationship among elements. Schema addresses limitations of DTDs and the increasingly data-oriented role of XML. W3C XML Schema Working Group: two documents: XML structures and datatypes.

23 Schema Justification Description of document type’s structure should be in an XML document instead of written in special syntax (DTD). Schema are in XML: easier to edit and process using standard XML DOM manipulation tools. DTD notation doesn’t allow schema designers the power to impose strong data typing -- for example, the ability to say that a certain element type must always have a positive integer value, that it may not be empty, or that it must be one of a list of possible choices.

24 Metadata and Linking Standards Digital Object Identifier (DOI) and Persistent Object Identifiers. OpenURL and Value-Added Service Components (SFX). Open Archives Initiative (OAI), Dublin Core and Qualifiers, RDF. Local Resolver Servers.

25 Open Archives Initiative (OAI) Released version 1.0 of metadata harvesting protocols. Frozen through second quarter 2001. Mechanism for data providers to expose their metadata through an HTTP protocol and a mechanism for harvesting records containing metadata from repositories. Roots in e-print archives. Lightweight, low-barrier. Easy to implement Web server to handle OAI protocol requests; need to develop procedures to access and extract your metadata.

26 Ongoing Investigations Relationship between interoperability models for search and discovery: federated searching (OAI harvested) and broadcast, simultaneous searching of distributed repositories. Not mutually exclusive. OAI Provider and Harvesting software. Encoding Archival Description (EAD). OAI Engineering/CS/Physics site. Role of HTTP harvesting, Spider technology. Reference Linking integration built on OpenURL and DOI. Reference Assistant software with simultaneous search, point-of-contact assistance, and remote reference capability.

27 Portals and Gateways Role is to bring together and integrate disparate e-resources. Provide a systematic ‘view’ of the information landscape, particularly full-text. Two primary foci: robust search/navigation and the ability to link everywhere from anywhere in the environment of OPACs, A & I Services, full-text. Central to this implementation is federated and simultaneous search and reference linking technologies.

28 Digital Object Identifier (DOI) DOI is both a unique identifier of a piece of digital content AND a system to access that content digitally. Persistent object identifier. ‘The ISBN for the 21st Century’ -- Norman Paskin. DOI system has two main parts: (the identifier and a directory system) and a third logical component, a database. Developed by AAP (Association of American Publishers), now managed by International DOI Foundation.

29 DOI Construction First real open standard for content identification. DOI is a number that identifies a digital object: –10.1063/S000369519903216 10 Registration Agency Prefix 1063Publisher Prefix S000369519903216 Suffix (Publisher-assigned ID) Suffix can be SICI or PII. The DOI and URL pointing to the digital object, is registered with the International DOI Foundation, e.g: –10.1063/333 | http://www.pubsite.org/apr99/artl1.pdf

30 Using a DOI DOIs are resolved using the Handle System technology from CNRI (Corporation for National research Initiatives). Retrieval of object is two step process: link is sent to central directory where current Web address is stored, location is sent back to browser with special message to redirect to address, e.g: –dx.doi.org/10.1063/333 redirects to www.pubsite.org/apr99/artl1.pdf

31 Reference Linking CrossRef Publisher system: major Sci-Tech professional societies and commercial publishers. System design calls for one URL for each DOI; underlying technology can handle multiple URLs however. Issue: Directing users to locally held or licensed version of Digital Object (locally loaded or from Aggregator). Appropriate Copy problem.

32 Cookie on client Client (Web Browser) DOI Proxy Illinois Local Link Server OpenURL Aware Local AIP, IEE CrossRef Metadata Database dx.doi.org/10.1063/1234 Handle Server AIP IEE Elsevier DOI Metadata Local Value Added Nosfx=y UIUC Metadata Registry OpenURL

33 Simultaneous Search Implementations DialIndex from Dialog. Ex Libris MetaLib service. Endeavor EnCompass. Innovative Interfaces MetaFind. Ovid Multiple Search and reference De-Duping. ISI Web of Knowledge. Gale Corporation InfoTrac Total Access. WebFeat. California Digital Library SearchLight system. Los Alamos FlashPoint system. Fretwell-Downing partnering with ARL Portal and Monash University.

34 Grainger Search Aid Assist users in the selection of appropriate databases. Normalize user search arguments and display search results from candidate databases. Cross-database asynchronous concurrent searching. Article level and e-journal Web site access to publisher full-text repositories. Utilize OpenURL, CrossRef metadata database and DOI for reference linking at the article level. Proxying of vendor systems and capability of ‘taking over’ the search in vendor native mode.

35 Grainger Search Aid

36

37

38

39 Reference Assistant Project Utilize Search Aid simultaneous search and link capabilities. Opportunity to explore interface and navigation issues. Mimics the behavior of reference librarian. Allows the application of ‘best match’ and ‘quorum searching’ algorithms.

40 Reference Assistant Top Menu

41

42

43 Simultaneous Search Implementations Shared Blackboard approach employing Independent Searchbots dedicated to searching information resources and passing results to Web clients. Event-Driven, Asynchronous HTTP Queries from within a Single Script returning results to Web browser.

44 Event-Driven, Asynchronous Queries Single, event-driven web server process, asynchronously querying multiple resources. Uses WinHTTP from ASP and VBScript Simpler, not as flexible. Search algorithms and processing coded in scripts. This is the approach we currently use for our service. Implementation of multi-step login and session variable passthru being investigated.

45 OpenURL-Based Services Standard for expressing and transmitting metadata. Promise of standardized, normalized search results. Provides value-added links to the Ovid search results. Using CrossRef metadata database to look up DOIs.

46 CiteParse.dll An ActiveX DLL which can parse various Ovid citations and turn them into OpenURLs: Tansu N. Chang YL. Takeuchi T. Bour DP. Corzine SW. Tan MRT. Mawst LJ. Temperature analysis … quantum-well lasers. [Article] IEEE Journal of Quantum Electronics. 38(6):640-651, 2002 Jun. http://…/resolver.asp?genre=article&aulast=Tansu&auinit1= N&atitle=Temperature+analysis+…+quantum- well+lasers&title=IEEE+Journal+of+Quantum+Electronics& volume=38&issue=6&spage=640&epage=651&pages=640- 651&date=2002-06

47 Conclusions User reactions very positive. The one-stop-shopping approach has been successful. Users consider ability to link to full-text from citations in A & I Services and from references on publisher portals very helpful. Technically, best approach appears to be a hybrid of asynchronous client interface with Web Services querying databases. Moves database middleware to Web Services and eliminates extensive custom script code for search and database query.

48 Publishing Trends Publishers will continue to add value to online journal articles. Digital version will become version of record. Virtual journals (both publisher-based and cross-publisher) will become common. Next-generation knowledge environments will evolve. Multimedia, data exposed, live equations with in-place calculations.

49 Publishing Trends (Continued) Personalized services will be available -- agent technology, alerting services. Different economic and subscription models will be introduced. Deconstruction of Journal (Bob Kelly, APS); article at a time publishing. Journal branding or perhaps publisher branding. Academia issues: publishing, tenure.

50 Continuing Issues Role of Authors, Academic Institutions, Libraries, Publishers, Abstracting & Indexing Services. Disintermediation may affect both Libraries and Publishers. Information as Function not Place. Provide a ‘Digital Library’ out of digital collections. Role of XML technology. Service mechanisms: processing & archiving, search and discovery, presentation, linking.


Download ppt "Emerging Information Technologies: The Role of XML, DOIs, OpenURL, and Federated Search William H. Mischo Grainger Engineering Library."

Similar presentations


Ads by Google