The Future of Isite - Growing GILS Archie Warnock A/WWW Enterprises
What Is Isite? n Isite is a standards-based Internet toolkit for information search and retrieval (Z39.50) n Isite was developed by MCNC/CNIDR n Isite was intended as a replacement for freeWAIS n Funded by a US NSF grant n There are other good Z39.50 toolkits, too
Isite Architecture n Isite is written in C++ to utilize the usual object-oriented advantages n Major components Isearch - the search and retrieval engine SAPI - the Z39.50 search engine API Zdist - the Z39.50 implementation
Isite Architecture - Example Programs n Iindex, Isearch, Iutil - the search engine n Isearch-cgi - the CGI gateway to Isearch n zclient, izclient, zping, zbatch - the Z39.50 clients n zserver, zserverNT - the Z39.50 servers n zcon & zgate - the WWW-to-Z39.50 gateway
Current Status of Isite n MCNC/CNIDR funding from NSF is finished Successful completion of 3 year grant Jim Fullton, PI, is now at WIPO in Geneva No additional support is anticipated n Other projects are supporting customization FGDC, US Dept. of Commerce, US Patent & Trademark Office, CEO, STScI, World Bank, BSn
Isite Strengths n Powerful and flexible search engine n Community-based development of a reference implementation n Freely distributed and widely available for any use n Source code included n Powerful search engine interface n Ported to Windows NT with threaded Z39.50 server
Isearch Features Full text search Search on text fields Search on numeric fields with appropriate relations (>, <, =) Search on date fields with appropriate relations (before, during, after) Search on geospatial bounding box Boolean searches Phrase searching Right truncation Proximity searching (within N characters) Case insensitive searching, punctuation ignored Configurable stopword list Customizable results presentation Relevance ranked scores Term weighting
Isearch Document Types n ASCII text n USMARC records n Electronic mail folders n Usenet news archives n US patents n IAFA templates n BIBTeX n Filenames n First line in file n SGML tagged fields HTML GILS templates FGDC templates n Colon delimited fields GCMD DIF templates n whois++ templates n Multi-file documents n Medline
Isite Weaknesses n Modest Z39.50 implementation needs GRS-1 better USMARC support data structures n All examples are console applications n No real end-user applications n No GUI interface n Difficult configuration n Requires programming for extensions n Needs optimization & performance enhancement n Needs more documentation
What The Future Holds For Isite n New Projects want (and will get): Distributed document collections Distributed searching Automated information extraction (centroids, templates) Searching and referrals Additional Z39.50 support (lots of Z39.50 details are not supported now)
GILS and the Advanced Search Facility n ASF is a US Dept. of Commerce project, to be built by Pilot Research, MCNC and A/WWW Enterprises n “GILSnet” - a network of cooperative, low-impact, distributed nodes n The basic interchange will be GILS templates n Search on full text and GILS records
GILS, Dublin Core and Everyone Else n Dublin Core is a minimal (15 fields) generic metadata scheme for virtually any kind of document n GILS represents a more detailed approach, including most of DC, providing greater interoperability n GILS is less bibliographically oriented than BIB-1 n GILS is lightweight compared to GEO and CIP (which have specific functional requirements
What GILS Means To Me -1 n Fewer fields More documents More metadata records Skinnier metadata records Easier abstraction n More fields Fewer documents Fewer metadata records Fatter metadata records Less abstraction GILS is a good, general compromise
What GILS Means To Me - 2 n Think of the GILS profile as defining a language At some level, Z39.50 is a detail Protocols are about communication, profiles are about abstraction and GILS is about content Z39.50 guarantees that the user’s query can be unambiguously decoded - no guarantees about content We could implement the profile over any protocol - http, CORBA, etc. Does GILS have to use Z39.50? No, but the abstraction is required Z39.50 already includes the abstraction model
Related Documents n Getting Isite ftp://ftp.cnidr.org/pub/software/Isite ftp://ftp.clark.net/pub/warnock/Software (pre) n A/WWW Enterprises ml US Phone/FAX: