Web Indexing and Searching By Florin Zidaru
Outline Web Indexing and Searching Overview Swish-e: overview and features Swish-e: set-up Swish-e: demo
Overview: Web indexing A convenient method to allow clients to retrieve information from a web site is by building an index of the web pages and providing search capability not a simple task
Overview Many websites opt to provide search capabilities The major problem encountered: relevancy of the search results Example: search for Home Office
Swish-e: Overview Simple Web Indexing System for Humans - Enhanced indexing web pages, text files, mailing list archives, or data stored in a relational database. fast, flexible, free open source, highly configurable
Swish-e: Features ideally suited for collections of a million documents or smaller Quickly index a large number of documents in text, HTML, and XML index other types of files such as PDF, gzip, or PostScript Includes a web spider for indexing remote documents over HTTP
Swish-e: Features Document summaries can be returned with each search Phrase searching and wildcard searching Limit searches to parts of documents such as certain HTML tags (META, TITLE, comments, etc.) or to XML elements Easily limit searches to parts or all of your web site
Swish-e: Set-Up Web server (Ex. Apache) Needs a C compiler (gcc recommended) Versions for Windows available
Swish-e: Set Up Download and install swish-e from swish- e.org Generate the index Set-up a CGI script Tell the cgi script where to look for the index
Swish-e: Demo Demo my installation. Demo other installations:
References