Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of.
Published byModified over 5 years ago
Presentation on theme: "Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of."— Presentation transcript:
Connecting Diverse Web Search Facilities Udi Manber, Peter Bigot Department of Computer Science University of Arizona Aida Gikouria - M471 University of Athens, 2002
Searching the Web Need to access an enormous amount of information in the WWW. Commonly used: global search engines - collect as much into flat databases (keyword based). New idea: Connect together many different diverse web search facilities (techniques & software)
Web Search Facilities Global Search Engines e.g. Altavista, Infoseek, Excite, Lycos. Collect as much web pages and classify them based on keywords (automatically or manually). Users search the Web through the database generated based on the keywords. Inherent Limitations: - noise (irrelevant results) - time consuming - not scalable solution The Challenge provide ways to focus and customize search better without making it too difficult or inefficient.
Two-level Search Idea Specific databases for specific topics Similar to using the library subject card catalog Access through one interface to hundreds of existing search facilities in many different customizable ways First Phase search the right database Second Phase search desired information within the database
Search Broker Collected 400 different search providers Search server covers a certain subject or category Each category is identified by one or two words and associated with a list of aliases ! The collection of search engines and the aliases are done manually by a librarian.
Examples Question: How do you delete a directory in Unix? Subject & question: “Subject: unix; Query: delete directory;” Search Broker Syntax: “unix delete directory” ! First word denotes the subject ( user identifies it). The rest is the question.
Search Broker Steps 1. It searches its own database for subjects and aliases and finds the search engine corresponding to the subject. 2. The rest of the query is reformatted to the form expected by that search engine 3. HTTP request to the search engine with the appropriate fields 4. Results are sent back to user
Conclusions Search Broker complements existing search engines. It complements the middle ground between completely automated search systems on the one hand and manual collection of information on the other.
The Universal Search Interface A client tool that extends Search Broker: - users pick search facilities & customize them - tools to connect several search engines ( -- or =) to pipe results between searches (web and/or local). One common interface. Users construct “search scenarios” combining several facilities.
Issues of Implementation Generality: must be able to accommodate most search facilities. Customizability: set user preferences, customize search facilities. Ease of integration: adding or extending search facilities. ! Necessity to make tradeoffs. ! As simple as possible.
Functionality 1. Maintain favorite lists, bookmarks 2. Build your own Search Broker: - by extending items of a hot list to be active - by triggering a search 3. Construct complex search scenarios. Concepts: “Search Object” encompasses interface, options and formatting of results for each search engine. “Input, Output Schemas” for search objects. “Schema Converting Objects” to extract, filter and reformat the results of intermediate searches.
USI Collect Search Objects Customize Objects Combine Several Objects Create your own interface to scenarios Provide GUI tools which activate and modify scenarios. (type checking, modification tools, import/export, organization)
Composing Scenarios Query multiple servers and combine results Use results of one query as input to another. User may examine intermediate results. Web Browsing includes personal ‘web’ actions Objects with specific input and output schemas. Search Objects Translator Objects ( between different schemas ) Filtering Objects (extracts specific information)