Presentation is loading. Please wait.

Presentation is loading. Please wait.

A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock

Similar presentations


Presentation on theme: "A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock"— Presentation transcript:

1 A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock warnock@clark.net

2 A/WWW Enterprises2 Who is MCNC/CNIDR? u MCNC = Microelectronics Consortium of North Carolina u CNIDR = Clearinghouse for Networked Information Discovery and Retrieval u Originally funded by NSF to coordinate and produce network information tools u Now developing public domain and commercial search/retrieval tools

3 A/WWW Enterprises3 What is Isearch? u Isearch is the successor to freeWAIS u Isearch is a sophisticated full-text search and retrieval system u Isearch is a component of Isite, an implementation of the NISO standard protocol Z39.50 for information search and retrieval u ftp://ftp.cnidr.org/pub/NIDR.tools/Isearch u http://vinca.cnidr.org/software/Isearch/Isearch.html

4 A/WWW Enterprises4 Terminology - I u Client/server - an architecture to allow communications between programs, possibly on different computers u Protocol - the communication “language” used by client and server programs u http - the protocol used by WWW clients and servers u CGI - mechanism to process WWW forms

5 A/WWW Enterprises5 Terminology - II u Query - user-supplied search criteria u Full-text search - word-based search of all the text in a document u Fielded search - word-based search of text within only certain fields in a document u Z39.50 - a standard protocol for network- based document search and retrieval

6 A/WWW Enterprises6 System Components - I u Iindex, the Text Indexer - builds searchable version of the document collection u Implements fast word-based searching u Document parser - recognize start/end of individual documents u Field parser - recognize start/end of fields within individual documents

7 A/WWW Enterprises7 System Components - II u Isearch, the Search engine - searches a document collection based on user- supplied query u Command line search u Primarily used for testing u WWW gateway (using CGI) u End-user interface using forms u Z39.50 gateway

8 A/WWW Enterprises8 Isearch Capabilities u Fast full-text search u US AIDS Patent Collection - can search ~250,000 patents in < 1 second u Fielded search u Can restrict searches to title, author, abstract, other fields u Relevance ranking u Search “hits” are assigned scores & sorted

9 A/WWW Enterprises9 Isearch Capabilities u Word truncation u search for “matri*” matches “matrix” and “matrices” u Boolean functions u AND, OR and ANDNOT combinations of different fields u Customized presentation of results u Phrase searching (coming soon)

10 A/WWW Enterprises10 Isearch Customization u What’s needed to customize Isearch? u Isearch is written in C++ u Documents are C++ objects - data & procedures u Already have SGML & HTML, among others u Object technology allows code reusability, customizing only where differences from existing objects occur

11 A/WWW Enterprises11 Isearch Customization u What’s needed to make arbitrary documents searchable? u Code to parse documents u Code to parse fields u Code to build brief and full result records u Yes, it requires programming u But, many of these are derived from existing procedures

12 A/WWW Enterprises12 Customization Example - Linear Algebra u Inputs u SGML-tagged bibliographic records u T E X preprints u Requirements u Field searching on title, author, abstract u Full-text search of preprints u WWW-based interface

13 A/WWW Enterprises13 Customization Example - Linear Algebra u End products u HTML-tagged “brief records” - title, author and links to full bibliographic records and preprints u HTML formatted bibliographic records for display in WWW browser u Preprints for display or retrieval to local storage

14 A/WWW Enterprises14 Customization Example - Linear Algebra u Sample Bibliographic Record #### ## Title text Author Name Abstract text Preprint.filename ###-###

15 A/WWW Enterprises15 Customization Example - Linear Algebra u Isearch Modifications u ~1 week coding and testing, mostly in developing presentation customizations u Additional work to develop ingest and on- the-fly formatting scripts, code deployment at ESI u Now have basic code to handle SGML documents using Elsevier DTD


Download ppt "A/WWW Enterprises1 Introduction to CNIDR’s Isearch Archie Warnock"

Similar presentations


Ads by Google