Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,

Similar presentations


Presentation on theme: "Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,"— Presentation transcript:

1 Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, nguylin}@iit.edu Information Retrieval Laboratory Illinois Institute of Technology Chicago, IL USA

2 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 2 Goal To motivate research in peer-to-peer information retrieval (P2P IR). To model P2P IR in terms of a metasearch engine.

3 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 3 Model Peers share data objects, each described with a descriptor (bag of terms). Peers are connected in a random graph. Queries (bag of terms) are routed to peers (servers) that return references to data objects O s.t.: D O  Q D O is the descriptor of O. Each descriptor also contains the hash value of the data object.

4 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 4 Metadata Distribution Example Assume Q={Mozart, Concerto}. Ungrouped results: Hash Key All descriptors contain Q. Sources

5 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 5 Motivation for Model Peer to peer file-sharing. Millions of users. Petabytes of data. Data objects are replicated. A replica’s descriptor is independently maintained.

6 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 6 Metasearch Engines Search other search engines. dogpile.com askjeeves.com

7 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 7 Main Metasearch Engine Activities Source selection. Which search engines to search. Query dispatching. Translating a query to a local format. Result selection. Picking from the multiple result sets. Result merging. Unifying/ranking the selected results.

8 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 8 Source Selection Metasearch engine. Employs profiles of each search engine to make decision. P2P File-Sharing System. Routing: Flooding. Use of statistics of neighbors. Distributed hash tables. Cost related to peer autonomy.

9 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 9 Query Dispatching Metasearch Engine. One search engine may use a vector space model, and another might use a Boolean model. P2P File-Sharing System. Some search engines, such as eMule, access multiple networks.

10 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 10 Result Selection Metasearch Engine. Some results lists might be pruned if they come from less relevant search engines. Uses search engine profiles. P2P File-Sharing System. Generally, all results are sent to the client.

11 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 11 Result Merging Metasearch Engine. Rankings from individual lists. Profiles of search engines. P2P File-Sharing System. Group results. Rank based on likelihood of successful download: Group size. Connection quality.

12 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 12 Example Search on Limewire’s Gnutella Query (number of results) Descriptors Group Size

13 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 13 Basic Difference Metasearch engines assume a fixed and reliable set of search engines.  Can collect statistics on search engines to improve query processing and results.

14 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 14 P2P File Sharing Research Areas (1/2) Source selection: Inexpensive routing with autonomous peers. Query dispatching: Translating queries to maximize precision and recall of final result set.

15 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 15 P2P File Sharing Research Areas (1/2) Result selection: Usage of queries and local statistics to prune returned results. Result merging: Usage of replication and distributed metadata to improve rankings. Recall: link analysis for Web search.

16 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 16 Goals of Open Source in P2P File-Sharing Systems Allow the communal development of the technology. New routing techniques. New ranking functions. Disclose all functionality. Better security. No spyware.

17 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 17 Examples of Openness in P2P File-Sharing Gnutella is an open protocol. Limewire, Bearshare, Kazaa. Limewire publishes an open-source implementation of the Gnutella protocol. eMule is another open-source project built on a competing protocol.

18 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 18 Conclusion Many research areas. Can be modeled as a form of metasearch engine. High impact. Many users and petabytes of data. There already exists an active open- source community. Large community of users and much source exist.

19 Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 19 Questions and Contact Information Wai Gen Yee yee@iit.edu ir.iit.edu/~waigen Recent results and publications. Information Retrieval Laboratory, Illinois Institute of Technology ir.iit.edu


Download ppt "Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,"

Similar presentations


Ads by Google