Presentation is loading. Please wait.

Presentation is loading. Please wait.

Goal: build a decentralized, distributed search engine framework no single point of failure not controlled by any one administration or corporation Design.

Similar presentations


Presentation on theme: "Goal: build a decentralized, distributed search engine framework no single point of failure not controlled by any one administration or corporation Design."— Presentation transcript:

1 Goal: build a decentralized, distributed search engine framework no single point of failure not controlled by any one administration or corporation Design challenge: scalable, robust, and adaptable to changing topic popularity Approach: partition the search space by semantic topic using a hierarchical taxonomy generic topics are higher up, more specific topics are lower news sportsscience radioTVprint volleyballcomp. sci.biology networkstheorysystemsdatabase Nelson Tang (tang@cs.ucla.edu) and Lixia Zhang (lixia@cs.ucla.edu) http://irl.cs.ucla.edu/IDG/ Future work: measure effects of enhancements: system-wide “Hot Topics” cache, cross-references, duplicate query detection other trace data: UCLA traffic, more traces needed! Components of the IDG System: Cache Client Topical group: top-level Topical group: news Manager: news Manager: radio Manager: print Topical group: science Manager: science Manager: comp. sci. Manager: biology Manager: sports Manager: TV Topical group: sports Manager: vball.......….… data source - represents an information provider (e.g., a website) manager - assigned a topic from the taxonomy; holds listings of data sources with that topic client - searches for managers that have listings of interesting data sources topical group - groups together related managers cache - helps clients find popular managers quickly INTERNETRESEARCH LAB How a query is answered: Topical group: top-level Topical group: science Manager: science Manager: biology Manager: comp. sci. Topical group: comp. sci. Manager: networks Client Topical group: top-level Topical group: science Manager: science Manager: biology Manager: comp. sci. cache IDG query: “TCP” client issues query to its local cache cache searches internal memory, sending query to best matching IDG manager response: “networks” manager IDG finds true best manager and responds to cache, who relays response to client IDG directory adapts to changing topic popularity: Topical group: science Manager: science Manager: biology Manager: comp. sci. Topical group: comp. sci. Manager: networks...... science manager is overloaded with too many data sources Topical group: science Manager: science Manager: biology Manager: comp. sci. Topical group: comp. sci. Manager: networks Manager: physics.......… new manager from pool of free managers is activated and assigned the topic physics data sources more specific to physics are moved to new manager Maintaining the IDG directory: Manager: golf Manager: hockey Manager: tennis Other managers: golf tennis Other managers: hockey tennis Topical group: sports Manager: tennis managers periodically multicast heartbeat messages managers learn about other managers for failure recovery and to help forward queries Topical group: sports Topical group: tennis Manager: hockey Manager: golf Manager: tennis Proxy: tennis New YorkLos Angeles Manager: lessons Manager: players managers use locally- scoped multicast to limit their heartbeats a unicast channel connects the manager and proxy faraway managers use a proxy to maintain a presence in the other scope Associate topics with locations to reduce heartbeat bandwidth: Simulation configuration: implemented using Parsec language Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses) queries hashed into a manually-built taxonomy based on Yahoo directory to simulate data source registration, queries treated as data sources Simulation configuration: implemented using Parsec language Excite search engine trace over 24 hours; approx. 2.5 million queries, 537,000 unique users (IP addresses) queries hashed into a manually-built taxonomy based on Yahoo directory to simulate data source registration, queries treated as data sources Hierarchy stability # of data sources per manager Multicast overhead % of total multicast with global scope Hierarchy overhead # of managers per group Query search time # of hops per query


Download ppt "Goal: build a decentralized, distributed search engine framework no single point of failure not controlled by any one administration or corporation Design."

Similar presentations


Ads by Google