Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong,

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Peer to Peer and Distributed Hash Tables
Data Currency in Replicated DHTs Reza Akbarinia, Esther Pacitti and Patrick Valduriez University of Nantes, France, INIRA ACM SIGMOD 2007 Presenter Jerry.
Modern information retrieval Modelling. Introduction IR systems usually adopt index terms to process queries IR systems usually adopt index terms to process.
Evaluating scalability Peer-to-Peer File Sharing Networks of Sayantan Mitra Vibhor Goyal.
Clayton Sullivan PEER-TO-PEER NETWORKS. INTRODUCTION What is a Peer-To-Peer Network A Peer Application Overlay Network Network Architecture and System.
PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D.
SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh.
Denial-of-Service Resilience in Peer-to-Peer Systems D. Dumitriu, E. Knightly, A. Kuzmanovic, I. Stoica and W. Zwaenepoel Presenter: Yan Gao.
Gnutella 2 GNUTELLA A Summary Of The Protocol and it’s Purpose By
A Distributed Indexing Strategy for Efficient XML Retrieval Efficiency Issues in Information Retrieval Workshop 30th European Conference on Information.
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
Application Layer Overlays IS250 Spring 2010 John Chuang.
FRIENDS: File Retrieval In a dEcentralized Network Distribution System Steven Huang, Kevin Li Computer Science and Engineering University of California,
1 Distributed, Automatic File Description Tuning in Peer-to-Peer File-Sharing Systems Presented by: Dongmei Jia Illinois Institute of Technology April.
“A Local Search Mechanism for Peer-to-Peer Networks”
Spotlighting Decentralized P2P File Sharing Archie Kuo and Ethan Le Department of Computer Science San Jose State University.
CSc 461/561 CSc 461/561 Peer-to-Peer Streaming. CSc 461/561 Summary (1) Service Models (2) P2P challenges (3) Service Discovery (4) P2P Streaming (5)
Basic Computer Networks Configurations (cont.) School of Business Eastern Illinois University © Abdou Illia, Spring 2006 Week 2, Thursday 1/19/2006)
Efficient Content Location Using Interest-based Locality in Peer-to-Peer Systems Presented by: Lin Wing Kai.
Routing of Structured Queries in Large-Scale Distributed Systems Workshop on Large-Scale Distributed Systems for Information Retrieval ACM.
Information Retrieval Ch Information retrieval Goal: Finding documents Search engines on the world wide web IR system characters Document collection.
Object Naming & Content based Object Search 2/3/2003.
1 Seminar: Information Management in the Web Gnutella, Freenet and more: an overview of file sharing architectures Thomas Zahn.
Text-Based Content Search and Retrieval in ad hoc P2P Communities Francisco Matias Cuenca-Acuna Thu D. Nguyen
Parallel and Distributed IR
Chapter 6 Security & Privacy Web servers continue to be attractive target for hacker for variety of reasons –Most easy target –Personal satisfaction –Political.
ICDE A Peer-to-peer Framework for Caching Range Queries Ozgur D. Sahin Abhishek Gupta Divyakant Agrawal Amr El Abbadi Department of Computer Science.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
Middleware for P2P architecture Jikai Yin, Shuai Zhang, Ziwen Zhang.
Freenet: A Distributed Anonymous Information Storage and Retrieval System Presentation by Theodore Mao CS294-4: Peer-to-peer Systems August 27, 2003.
Introduction Widespread unstructured P2P network
P2P Architecture Case Study: Gnutella Network
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
1 P2P Computing. 2 What is P2P? Server-Client model.
Thesis Proposal Data Consistency in DHTs. Background Peer-to-peer systems have become increasingly popular Lots of P2P applications around us –File sharing,
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
1 Exploiting locality for scalable information retrieval in peer-to-peer networks D. Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos Manos Moschous.
Chord: A Scalable Peer-to-peer Lookup Protocol for Internet Applications Xiaozhou Li COS 461: Computer Networks (precept 04/06/12) Princeton University.
1 Distributed Hash Tables (DHTs) Lars Jørgen Lillehovde Jo Grimstad Bang Distributed Hash Tables (DHTs)
SPAM DETECTION IN P2P SYSTEMS Team Matrix Abhishek GhagDarshan Kapadia Pratik Singh.
Super-peer Network. Motivation: Search in P2P Centralised (Napster) Flooding (Gnutella)  Essentially a breadth-first search using TTLs Distributed Hash.
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
Internet Real-Time Laboratory Arezu Moghadam and Suman Srinivasan Columbia University in the city of New York 7DS System Design 7DS system is an architecture.
Parallel and Distributed Searching. Lecture Objectives Review Boolean Searching Indicate how Searches may be carried out in parallel Overview Distributed.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
1 Peer-to-Peer Technologies Seminar by: Kunal Goswami (05IT6006) School of Information Technology Guided by: Prof. C.R.Mandal, School of Information Technology.
PEER TO PEER (P2P) NETWORK By: Linda Rockson 11/28/06.
A Utility-based Approach to Scheduling Multimedia Streams in P2P Systems Fang Chen Computer Science Dept. University of California, Riverside
Computer Networking P2P. Why P2P? Scaling: system scales with number of clients, by definition Eliminate centralization: Eliminate single point.
Scalable Hybrid Keyword Search on Distributed Database Jungkee Kim Florida State University Community Grids Laboratory, Indiana University Workshop on.
E a s y S h a r e Jung Son Ky Le. Operational Concepts Recent years, huge number of growth in Internet users and broadband usage File-sharing become extremely.
Peer to Peer Network Design Discovery and Routing algorithms
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
CS Spring 2014 CS 414 – Multimedia Systems Design Lecture 37 – Introduction to P2P (Part 1) Klara Nahrstedt.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
INTERNET TECHNOLOGIES Week 10 Peer to Peer Paradigm 1.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
P2P Content Search: Give the Web Back to the People Matthias Bender Sebastin Michel Peter Triantafillou Gerhard Weikum Christian Zimmer Mariam John CSE.
Harnessing P2P Power in the Classroom Julita Vassileva Department of Computer Science University of Saskatchewan, Canada.
Research Directions in Databases Technological Education Institution of Larisa in collaboration with Staffordshire University Larisa Dr. Theodoros.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Composing Web Services and P2P Infrastructure. PRESENTATION FLOW Related Works Paper Idea Our Project Infrastructure.
Statistics Visualizer for Crawler
Collection Fusion in Carrot2
CHAPTER 3 Architectures for Distributed Systems
Early Measurements of a Cluster-based Architecture for P2P Systems
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Presentation transcript:

Search in Peer-to-Peer File-Sharing Systems: Like Metasearch Engines, But Not Really Wai Gen Yee, Dongmei Jia, Linh Thai Nguyen {yee, jiadong, Information Retrieval Laboratory Illinois Institute of Technology Chicago, IL USA

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 2 Goal To motivate research in peer-to-peer information retrieval (P2P IR). To model P2P IR in terms of a metasearch engine.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 3 Model Peers share data objects, each described with a descriptor (bag of terms). Peers are connected in a random graph. Queries (bag of terms) are routed to peers (servers) that return references to data objects O s.t.: D O  Q D O is the descriptor of O. Each descriptor also contains the hash value of the data object.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 4 Metadata Distribution Example Assume Q={Mozart, Concerto}. Ungrouped results: Hash Key All descriptors contain Q. Sources

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 5 Motivation for Model Peer to peer file-sharing. Millions of users. Petabytes of data. Data objects are replicated. A replica’s descriptor is independently maintained.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 6 Metasearch Engines Search other search engines. dogpile.com askjeeves.com

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 7 Main Metasearch Engine Activities Source selection. Which search engines to search. Query dispatching. Translating a query to a local format. Result selection. Picking from the multiple result sets. Result merging. Unifying/ranking the selected results.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 8 Source Selection Metasearch engine. Employs profiles of each search engine to make decision. P2P File-Sharing System. Routing: Flooding. Use of statistics of neighbors. Distributed hash tables. Cost related to peer autonomy.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 9 Query Dispatching Metasearch Engine. One search engine may use a vector space model, and another might use a Boolean model. P2P File-Sharing System. Some search engines, such as eMule, access multiple networks.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 10 Result Selection Metasearch Engine. Some results lists might be pruned if they come from less relevant search engines. Uses search engine profiles. P2P File-Sharing System. Generally, all results are sent to the client.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 11 Result Merging Metasearch Engine. Rankings from individual lists. Profiles of search engines. P2P File-Sharing System. Group results. Rank based on likelihood of successful download: Group size. Connection quality.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 12 Example Search on Limewire’s Gnutella Query (number of results) Descriptors Group Size

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 13 Basic Difference Metasearch engines assume a fixed and reliable set of search engines.  Can collect statistics on search engines to improve query processing and results.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 14 P2P File Sharing Research Areas (1/2) Source selection: Inexpensive routing with autonomous peers. Query dispatching: Translating queries to maximize precision and recall of final result set.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 15 P2P File Sharing Research Areas (1/2) Result selection: Usage of queries and local statistics to prune returned results. Result merging: Usage of replication and distributed metadata to improve rankings. Recall: link analysis for Web search.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 16 Goals of Open Source in P2P File-Sharing Systems Allow the communal development of the technology. New routing techniques. New ranking functions. Disclose all functionality. Better security. No spyware.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 17 Examples of Openness in P2P File-Sharing Gnutella is an open protocol. Limewire, Bearshare, Kazaa. Limewire publishes an open-source implementation of the Gnutella protocol. eMule is another open-source project built on a competing protocol.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 18 Conclusion Many research areas. Can be modeled as a form of metasearch engine. High impact. Many users and petabytes of data. There already exists an active open- source community. Large community of users and much source exist.

Yee, Jia, Nguyen OSWIR, 2005 Workshop, Compiegne, France 19 Questions and Contact Information Wai Gen Yee ir.iit.edu/~waigen Recent results and publications. Information Retrieval Laboratory, Illinois Institute of Technology ir.iit.edu