A Dynamic Routing Protocol for Keyword Search in Unstructured Peer-to-peer Networks Authors: Cong Shi, Dingyi Han, Yuanjie Liu, Shicong Meng, Yong Yu APEX.

Slides:



Advertisements
Similar presentations
Peer-to-Peer and Social Networks An overview of Gnutella.
Advertisements

Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Replication Strategies in Unstructured Peer-to-Peer Networks Edith Cohen Scott Shenker This is a modified version of the original presentation by the authors.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Presented By- Sayandeep Mitra TH SEMESTER Sensor Networks(CS 704D) Assignment.
UNIVERSITY OF JYVÄSKYLÄ Building NeuroSearch – Intelligent Evolutionary Search Algorithm For Peer-to-Peer Environment Master’s Thesis by Joni Töyrylä
Improving TCP Performance over Mobile Ad Hoc Networks by Exploiting Cross- Layer Information Awareness Xin Yu Department Of Computer Science New York University,
Modeling and Analysis of Random Walk Search Algorithms in P2P Networks Nabhendra Bisnik, Alhussein Abouzeid ECSE, Rensselaer Polytechnic Institute.
Introduction to Information Retrieval (Part 2) By Evren Ermis.
1 An Overview of Gnutella. 2 History The Gnutella network is a fully distributed alternative to the centralized Napster. Initial popularity of the network.
Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
LightFlood: An Optimal Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
Mobile and Wireless Computing Institute for Computer Science, University of Freiburg Western Australian Interactive Virtual Environments Centre (IVEC)
Peer-to-Peer Networks João Guerreiro Truong Cong Thanh Department of Information Technology Uppsala University.
P2p, Spring 05 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems March 29, 2005.
A Distributed Search Service for Peer-to-Peer File Sharing in Mobile Application Presented by Tony Sung On Loy, MC Lab, CUHK IE 1 A Distributed Search.
Online Data Gathering for Maximizing Network Lifetime in Sensor Networks IEEE transactions on Mobile Computing Weifa Liang, YuZhen Liu.
UNIVERSITY OF JYVÄSKYLÄ Resource Discovery Using NeuroSearch Presentation for the Agora Center InBCT-seminar Mikko Vapa, researcher InBCT 3.2.
Peer-to-peer file-sharing over mobile ad hoc networks Gang Ding and Bharat Bhargava Department of Computer Sciences Purdue University Pervasive Computing.
 Structured peer to peer overlay networks are resilient – but not secure.  Even a small fraction of malicious nodes may result in failure of correct.
1CS 6401 Peer-to-Peer Networks Outline Overview Gnutella Structured Overlays BitTorrent.
P2P File Sharing Systems
INTRODUCTION TO PEER TO PEER NETWORKS Z.M. Joseph CSE 6392 – DB Exploration Spring 2006 CSE, UT Arlington.
Freenet. Anonymity  Napster, Gnutella, Kazaa do not provide anonymity  Users know who they are downloading from  Others know who sent a query  Freenet.
1 Napster & Gnutella An Overview. 2 About Napster Distributed application allowing users to search and exchange MP3 files. Written by Shawn Fanning in.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
IR Techniques For P2P Networks1 Information Retrieval Techniques For Peer-To-Peer Networks Demetrios Zeinalipour-Yazti, Vana Kalogeraki and Dimitrios Gunopulos.
09/07/2004Peer-to-Peer Systems in Mobile Ad-hoc Networks 1 Lookup Service for Peer-to-Peer Systems in Mobile Ad-hoc Networks M. Tech Project Presentation.
An affinity-driven clustering approach for service discovery and composition for pervasive computing J. Gaber and M.Bakhouya Laboratoire SeT Université.
Peer to Peer Research survey TingYang Chang. Intro. Of P2P Computers of the system was known as peers which sharing data files with each other. Build.
Quasar A Probabilistic Publish-Subscribe System for Social Networks over P2P Kademlia network David Arinzon Supervisor: Gil Einziger April
1 Exploiting locality for scalable information retrieval in peer-to-peer networks D. Zeinalipour-Yazti, Vana Kalogeraki, Dimitrios Gunopulos Manos Moschous.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Routing indexes A. Crespo & H. Garcia-Molina ICDCS 02.
Using the Small-World Model to Improve Freenet Performance Hui Zhang Ashish Goel Ramesh Govindan USC.
CCAN: Cache-based CAN Using the Small World Model Shanghai Jiaotong University Internet Computing R&D Center.
Structuring P2P networks for efficient searching Rishi Kant and Abderrahim Laabid Abderrahim Laabid.
Weighting and Matching against Indices. Zipf’s Law In any corpus, such as the AIT, we can count how often each word occurs in the corpus as a whole =
Quantitative Evaluation of Unstructured Peer-to-Peer Architectures Fabrício Benevenuto José Ismael Jr. Jussara M. Almeida Department of Computer Science.
GIA: Making Gnutella-like P2P Systems Scalable Yatin Chawathe Sylvia Ratnasamy, Scott Shenker, Nick Lanham, Lee Breslau (Several slides have been taken.
Load-Balancing Routing in Multichannel Hybrid Wireless Networks With Single Network Interface So, J.; Vaidya, N. H.; Vehicular Technology, IEEE Transactions.
A Peer-to-Peer Approach to Resource Discovery in Grid Environments (in HPDC’02, by U of Chicago) Gisik Kwon Nov. 18, 2002.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Robustness of complex networks with the local protection strategy against cascading failures Jianwei Wang Adviser: Frank,Yeong-Sung Lin Present by Wayne.
User-Centric Data Dissemination in Disruption Tolerant Networks Wei Gao and Guohong Cao Dept. of Computer Science and Engineering Pennsylvania State University.
The new protocol of freenet Taken from Ian Clarke and Oskar Sandberg (The Freenet Project)
O PTIMAL SERVICE TASK PARTITION AND DISTRIBUTION IN GRID SYSTEM WITH STAR TOPOLOGY G REGORY L EVITIN, Y UAN -S HUN D AI Adviser: Frank, Yeong-Sung Lin.
By Jonathan Drake.  The Gnutella protocol is simply not scalable  This is due to the flooding approach it currently utilizes  As the nodes increase.
LightFlood: An Efficient Flooding Scheme for File Search in Unstructured P2P Systems Song Jiang, Lei Guo, and Xiaodong Zhang College of William and Mary.
P2p, Fall 06 1 Topics in Database Systems: Data Management in Peer-to-Peer Systems Search in Unstructured P2p.
Rate-Based Channel Assignment Algorithm for Multi-Channel Multi- Rate Wireless Mesh Networks Sok-Hyong Kim and Young-Joo Suh Department of Computer Science.
Taxonomy Caching: A Scalable Low- Cost Mechanism for Indexing Remote Contents in Peer-to-Peer Systems Kjetil Nørvåg Norwegian University of Science and.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
Design of a Robust Search Algorithm for P2P Networks
Energy Efficient Data Management for Wireless Sensor Networks with Data Sink Failure Hyunyoung Lee, Kyoungsook Lee, Lan Lin and Andreas Klappenecker †
Authors: Ing-Ray Chen and Ding-Chau Wang Presented by Chaitanya,Geetanjali and Bavani Modeling and Analysis of Regional Registration Based Mobile Service.
Lecture 12 Huffman Algorithm. In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly.
A Multicast Routing Algorithm Using Movement Prediction for Mobile Ad Hoc Networks Huei-Wen Ferng, Ph.D. Assistant Professor Department of Computer Science.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
1 Along & across algorithm for routing events and queries in wireless sensor networks Tat Wing Chim Department of Electrical and Electronic Engineering.
Distributed Caching and Adaptive Search in Multilayer P2P Networks Chen Wang, Li Xiao, Yunhao Liu, Pei Zheng The 24th International Conference on Distributed.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
Peer-to-Peer Information Systems Week 12: Naming
Controlling the Cost of Reliability in Peer-to-Peer Overlays
任課教授:陳朝鈞 教授 學生:王志嘉、馬敏修
GIA: Making Gnutella-like P2P Systems Scalable
Paraskevi Raftopoulou, Euripides G.M. Petrakis
Peer-to-Peer Video Services
Joydeep Chandra, Santosh Shaw and Niloy Ganguly
Peer-to-Peer Information Systems Week 12: Naming
Presentation transcript:

A Dynamic Routing Protocol for Keyword Search in Unstructured Peer-to-peer Networks Authors: Cong Shi, Dingyi Han, Yuanjie Liu, Shicong Meng, Yong Yu APEX Data and Knowledge Management Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, , China. {cshi, handy, lyjgeorge, bill, Presented By Rizal Mohd Nor

Introduction P2P systems like Kazaa and Gnutella suffer from their significant network overhead on flooding queries among supernodes (superpeers). A lot of has been done to reduce network overhead and reduce network overhead and enhance query result quality. They can be categorize in two classes:  Blind routing (like flooding and random walks)  Routing indices (content-oriented routing indices and query oriented)

Introduction Turns out that query oriented routing indices shows better performance because it utilizes the historical information of queries and query hits to help route future queries. However, previously proposed query-oriented routing index is not practically effective learning method, it only achieves slow improvements in routing efficiency.

Contribution Propose a learning-based query routing protocol (LBQR) to achieve efficient query-oriented routing indices.  Utilizes mass peer behavior to learn the distribution of the contents that peers are interested in.  Reinforcement learning is applied to construct and maintain routing indices.  A formalized update strategy is adopted to form an accurate depiction of the content distribution.  Optimization methods are taken to improve the effect of routing and maintenance mechanism.

Related Work (1) Blind Routing  Expanding Ring Several successive flooding searches are initiated with increasing TTLs until enough responses are received.  LightFlood A tree-like sub-overlay is constructed to minimize the redundant messages and retain the same search scope as flooding  Random Walk A peer randomly chooses a neighbor to forward the query  Max-Degree Biased Walk A peer forwards queries to the highest-degree neighbor  Max-Feedback Biased Walk A peer forwards queries to the highest feedback neighbor

Related Work (2) Routing Indices  RI for P2P Contents are classified under “topics” and peers index the number of documents under each topic reachable through each neighbor.  One hop replication Peers index the contents shared by their neighbors  Scalable Query Routing (SQR) Exponential Decay Bloom Filter is designed to encode probabilistic routing tables in a highly compressed manner.

Related Work (3) Query Oriented Routing Indices  Adaptive Probabilistic Search (APS) each node keeps a local index consisting of one entry for each object it has requested or forwarded a request for, per neighbor Searching is based on the simultaneous deployment of k walkers and probabilistic forwarding. Every walker terminates with either a success or a failure. Upon walker termination, “optimistic” or “pessimistic” approach is applied to update the index. No extra network overhead is necessary to maintain the routing indices, and the content distribution can be gradually learned.

Learning Based Query Routing (LBQR) Overview Every peer assigns its connections a set of weights. Each one of the weights corresponds to a query string. Use connections instead of neighbors, because peers several hops away are taken into account as well as immediate neighbors. Query Q’s weight for connection C at peer A is a measure of how likely A considers that it will get satisfactory results from C after forwarding Q to C. With the Zipf distribution of queries in practical P2P systems, LBQR aims at improving the overall performance of query routing by enhancing that of few queries frequently issued in the P2P networks. Since the queries and query hits are the only information used to maintain LBQR, no excessive network cost is produced. Zipf's law states that given some corpus of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table. Thus the most frequent word will occur approximately twice as often as the second most frequent word, which occurs twice as often as the fourth most frequent word, etc.corpusnatural language inversely proportional

LBQR Algorithm

LBQR Query Forwarding (1) The query forwarding process determines:  what the peer initiating the query gets;  and further affects the maintenance process of LBQR. When a peer initiates or receives a query, it should determine both how to route the query and how many copies of the query to be transmitted. When peer A is going to transmit k copies of query Qj, it chooses k connections one after another. Let A has M candidate connections C1,C2,..., CM (M > 0) to forward the query. weight(Ci,Qj) indicates query Qj ’s weight that peer A assigns to connection Ci. Then, peer A will choose Ci to route query Qj with the forwarding probability below:

LBQR Query Forwarding (2) When a peer receives a query, it first checks the query’s id. If the query is a redundant one, it simply drops it. The number of copies transmitted by a peer is where α is a predefined constant, and Hop is the hop distance between current peer and the query originator. The peer initiating the query forwards α copies, while the peers α hop away stop the routing process

LBQR Routing Indices construction and maintenance (1) After a peer forwards the query copies to some connections, it will get feedback from those connections. The expected value of feedback is a good estimation of the distribution of the contents matching a particular query string. After a peer forwards the query copies to some connections, it will get feedback from those connections. f A (Ci,Qj) indicates the feedback for query Q j that peer A gets from connection C j. It includes two parts, the query hit returned by immediate neighbor B and the feedbacks received by B. The number of contents, Num B, is used to represent the importance of the query hit returned by B. To peer A, the feedbacks received by B are not so determinate as what B responses.

LBQR Routing Indices construction and maintenance (2) The maintenance of visit n (C i,Q j ) reflects the importance of weight n (C i,Q j ). Therefore, it is much more proper to use the number of forwarded query copies to measure the importance of feedback. Suppose peer A is Hop n hop away from the originator. According to equation of # Query, the maintenance strategy for visit n (C i,Q j ) can be expressed as:

Optimization Multi-Keyword Queries (1) In LBQR, multi-keyword queries are treated the same as the single keyword queries. However, if the extra information of multi- keyword queries can be utilized, it will greatly enhance the average search efficiency. In dealing with the multi-keyword queries, the information of every keyword in the query can be used to help routing the query and achieving the content distribution for the query.

Optimization Multi-Keyword Queries (2) Suppose query Q is composed of keywords K 1, K 2,...,K n which are independent from each other. weight(C i,K j ) indicates the weight of K j for connection C i recorded in the indices. Then the combined distribution can be expressed as:

Optimization Rough Description of Content Distribution (1) LBQR gradually learns the distribution of contents which are of interest to peers, and the search efficiency is improved with the learning process. Therefore, when only a few queries are issued, it fails to achieve high search efficiency. However, if we can get a rough description of content distribution when the query is first seen, high search efficiency can be achieved even if the query is not very popular. If a query is flooded among peers, the accurate distribution of contents matching the query can be achieved. If a query is first seen to a peer, it floods the query within two hops. Then, the content distribution within the region is gained. LBQR helps peers to achieve the rough description of content distribution with only a little extra network overhead.

Results (1) Various Content Popularity: LBQR’s recall is generally constant despite of the content popularity variation. Its response rate is related to the content popularity as well as recall value. Therefore, in the case when recall is almost constant, it is proportional to the content popularity.

Results (2) Learning Rate: Figure demonstrates the learning rate of LBQR and APS when α = 6 and the number of contents is 100. It shows that LBQR’s response rate dramatically increases when a few queries are issued, and it quickly arrives at a stable state. At the meantime, APS’s response rate is almost proportional to the number of queries issued.

Results (3) Multi- Keyword Queries

Conclusion LBQR has a lot of advantages over previously proposed routing mechanisms. First, the mechanism manages to quickly achieve high routing efficiency, which ensures its practical deployment. Second, no excessive network overhead is consumed to construct and maintain the routing indices. Due to the dynamic nature of P2P networks, content-oriented routing indices should be frequently updated, which results in huge network overhead. Third, the construction of routing indices is based on what peers search, instead of what peers share. Therefore, only a little memory is used to record the routing information of contents which are of little interest to peers in the network.