Presentation is loading. Please wait.

Presentation is loading. Please wait.

PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D.

Similar presentations


Presentation on theme: "PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D."— Presentation transcript:

1 PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D. Nguyen Speaker: Sergey Lugin Seminar “Peer-to-peer Information Systems” January 2003

2 1.Introduction 2. Architecture local and global index gossiping algorithm content search and ranking 3. Performance content search and ranking gossiping algorithm 4. Group extension 5. Related works and summary Outline

3 Introduction PlanetP is a content addressable publish/ subscribe service for unstructured peer-to-peer (P2P) communities: no central management resilience to rapid membership changes content search and ranking scaling up to several thousand peers PlanetP features: a gossiping layer used to globally replicate an extremely compact content index a completely distributed content search and ranking algorithm that helps users find the most relevant information

4 1.Introduction 2. Architecture local and global index gossiping algorithm content search and ranking 3. Performance content search and ranking gossiping algorithm 4. Group extension 5. Related works and summary Outline

5 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 1 Local index Contents of peer files (video, music) are described in XML snippets that contain pointers to corresponded files Each peer runs a Simple Web Server to support peer’s retrieval of these files (to be considered further) Each peer summarizes the set of unique terms in its local index in a BLOOM FILTER Local Files XML Snippets Simple Web Server BLOOM FILTER: Local index unique terms pointers

6 0 0 0 0 0 0 0 0 BLOOM FILTER: BLOOM FILTER The Bloom filter is a bit-array that allows to quickly test a membership in a large term set using hash functions. compact form for the representation of a large term set quickness only Boolean answer ( Yes / No ) “false positive” problem 0 1 0 0 1 1 0 0 1 0 1 0 1 0 1 1 BLOOM FILTER: Terms: paper … cat hash functions query term: cat hash functions yes

7 Global index NicknameStatusIPBloom Filter BobON-LINE10.25.125.10BF[…….] GeraOFF-LINE10.67.12.101BF[…….] ………… FredON-LINE45.23.78.115BF[…….] JuriON-LINE125.37.167.15BF[…….] TanON-LINE10.25.125.11BF[…….] Global directory (index) Global directory describes all peers and all available information in the community Global directory is replicated everywhere by using gossiping Gossiping is used also to keep peers synchronized joining or leaving of peers new data

8 Gossiping algorithm PlanetP uses gossiping to replicate the global index across peers of a P2P community robustness to the dynamic joining and leaving of peers independence from any particular subset of peers being on-line PlanetP’s gossiping algorithm: rumoring algorithm anti-entropy algorithm partial anti-entropy algorithm Demer’s algorithm

9 Rumoring algorithm A peer has a change A peer has a change: rumor Purpose: The algorithm provides spreading of new information across a P2P community rumor if the rumor is new information for the peer P y, then it starts to push this rumor just like P x the peer P x stops pushing the rumor after it has contacted n consecutive peers that already heard the rumor PxPx rumor every T g seconds, a peer P x pushes this change (a rumor) to a peer P y chosen randomly from the global index PyPy

10 Anti-entropy algorithm Purpose: The algorithm allows to avoid the possibility of rumors dying out before reaching everyone All peers: global index summary the peer P y returns the summary of its version of the global index then P x can ask P y for any new information that it does not have PxPx every T r seconds, a peer P x attempts to pull information from a peer P y chosen randomly from the global index pull PyPy

11 Partial anti-entropy algorithm Purpose: The algorithm allows to reduce the time of new information spreading Extension of each push operation: The process requires only one extra message exchange in the case that P y knows something that P x does not identifiers of recent rumors the peer P y piggybacks the identifiers of a small number of the most recent rumors then P x can pull any recent rumor that did not reach it PxPx a peer P x pushes a rumor to a peer P y rumor PyPy

12 Membership Joining of new peers: gossiping NEW Leaving of present peers: a peer discovers that another peer is OFF-LINE when an attempt to communicate with it fails: the peer status is marked as OFF-LINE information in the global index is not dropped if a peer has been marked as OFF-LINE continuously for a time T Dead, it is assumed that the peer has left the community permanently: all information about the peer is dropped OFF-LINE Rejoining of peers: gossiping (the peer status is marked as ON-LINE) ON-LINE

13 Content search and ranking Q is a query, D is a document, t is a term, w Q,T – the weight of the term t for the query Q, w D,t – the weight of the term t for the document D each document and query is abstractly represented as a vector each dimension is associated with a distinct term the value of each component of a vector is the weight representing the importance of that term to the corresponding document or query PlanetP implements a Content Ranking Algorithm that uses the Vector Space Ranking Model Vector Space Ranking Model Relevance of the document and query the cosine of the angle between the query vector and document vector

14 Content Ranking Algorithm - Background TFxIDF is a popular method for term weight assignments TF is a Term Frequency how often the term appears in the document IDF is a Inverse Document Frequency the inverse of how often this term appears in the entire collection This technique allows to balance: the fact that terms frequently used in a document are likely important to describe its meaning terms that appear in many documents in a collection are not useful for differentiating between these documents

15 Approximation TFxIDF How this technique can be used for a P2P community ? Problems of the computation TFxIDF for a P2P community:  documents are distributed across a P2P community  the peer bandwidth is restricted to use PlanetP’s proposed solution Approximation TFxIDF two sub-problems: 1. Ranking peers according to their likelihood of having relevant documents 2. Deciding on the number of peers to contact and ranking identified documents

16 PlanetP’s Content Search Steps: 1.Ranking peers 2. Querying the most relevant peers Query Name Bloom Filter BobBF[…….] GeraBF[…….] …… FredBF[…….] JuriBF[…….] TanBF[…….] Ranking of peers Fred Bob Tan namerank Ranking results Doc - A Doc- E Doc - D 0,93 0,79 0,77 Querying Fred Bob Global directory

17 Ranking peers N – is the number of all peers N t – is the number of peers having the term t 1. Ranking peers new measure similar IDF Inverse Peer Frequency (IPF) a term that is present in the bloom filter of every peer is not useful for differentiating between the peers for a particular query IPF for a term t Ranking peers for a query Q:

18 Quering peers 2. Querying the most relevant peers Problem: As communities grow, it becomes infeasible to contact large subsets of peers for each query Solution: for a query Q, the user specifies a limit K on the number of potential documents that should be presented PlanetP sorts ranked peer lists and contacts peers (the most relevant peers) Each contacted peer returns a set of document URLs together with their relevance

19 Quering peers NamePeer rank Bob0.84 Gera0.77 …… Fred0.34 Sorted peer list p stop PlanetP stops contacting peers when the documents identified by p consecutive peers fail to contribute to the top K ranked documents ranked documents Ranking results Doc - A Doc- E Doc - D namerank 0,93 0,79 0,77 K 2. Querying the most relevant peers C 0, C 1, C 2 – are constant values Simulation results showed that p should be a function of the community size N and K as follows: stopping heuristic:

20 1.Introduction 2. Architecture local and global index gossiping algorithm content search and ranking 3. Performance content search and ranking gossiping algorithm 4. Group extension 5. Related works and summary Outline

21 Performance Performance Study:  Content Search  Gossiping Efficacy Time Bandwidth Usage Performance Study was based on the developed simulator The simulator was validated against measurements taken from prototype (up to several hundred peers)

22 Content search and ranking algorithm Metrics: Recall (R), Precision (P) Input data: The collection AP89 was extracted from the TREC collection (Associated Press)  Uniform (The worst case for a distributed search)  Weibull (7% of the users in the Gnutella community share more files than all the rest together) Different document-to-peer distribution:  No. Docs = 84678  No. Unique Terms = 129603

23 T.W is a search engine using TFxIDF (centralized implementation) P.W is the PlanteP’s search engine (Weibull distribution of documents) P.U is the PlanteP’s search engine (Uniform distribution of documents) T.W P.W P.U T.W P.U P.W Precision Recall No. documents requested Content search and ranking algorithm

24 No. peers contacted Recall No. documents requested Number of peers Content search and ranking algorithm community of 400 peers stopping heuristic T.W is a search engine using TFxIDF (centralized implementation) P.W is the PlanteP’s search engine (Weibull distribution of documents) P.U is the PlanteP’s search engine (Uniform distribution of documents)

25 Observations 1. PlanetP tracks the performance of the centralized implementation closely Performance is independent of how the shared documents are distributed. PlanetP’s recall and precision is within 11% of TFxIDF’s implementation. 2. PlanetP scales well for communities of up to 1000 peers, maintaining a relatively constant recall and precision. 3. PlanetP’s stopping heuristic allows to maintain the close recall and precision independently of how the documents are distributed.

26 Gossiping algorithm Measured factor: propagation time LAN-AE: Peers use only push anti-entropy: each peer periodically push a summary of its data structure. The target requests all new information from this summary. LAN: Peers use PlanetP’s gossiping algorithm. ParameterValue Base gossiping interval30sec Message header size3 bytes 1000 terms BF3000 bytes BF summary6 bytes Peer summary48 bytes PlanetP’s parameters Time (sec) No. peers LAN-AE LAN (PlanetP) Network: 45 Mbps Time required to propagate a single Bloom filter everywhere.

27 Gossiping algorithm Different gossiping intervals: 10 sec (DSL-10) 30 sec (DSL-30) 60 sec (DSL-60) DSL- 60 DSL- 30 DSL- 10 Measured factor: propagation time, average bandwidth Time (sec) No. peers Average Bandwidth (Bytes/s) DSL- 60 DSL- 30 DSL- 10 Network: 512 Kbps PlanetP’s gossiping algorithm

28 Observations 3. Total number of bytes sent is very modest, implying that gossiping is very scalable 1. The algorithm significantly outperforms ones that use only push anti-entropy for both propagation time and network volume 2. Propagation time is a logarithmic function of community size 4. We can easily trade off propagation time against gossiping bandwidth by increasing or decreasing the gossiping interval

29 1.Introduction 2. Architecture local and global index gossiping algorithm content search and ranking 3. Performance content search and ranking gossiping algorithm 4. Group extension 5. Related works and summary Outline

30 Group extension – Gossiping Group A Group C Group B attenuated BF Community is divided into a number of groups Peers within the same group operate as described above Peers from different groups will gossip an attenuated Bloom filter that is a summary of the global index for their groups

31 Group extension – Content search Group A Group C Search: P A1 P Ci P C3 P C2 a peer P A1 (group A) try to find documents that are relevant to a query Q the Bloom filter of the group C contains relevant terms to a query Q query the peer P A1 queries to a random peer P Ci from the group C 0.77P C2 0.84P C3 Peer rankName the peer P Ci returns a ranked list of peers in the group C ranked list

32 1.Introduction 2. Architecture local and global index gossiping algorithm content search and ranking 3. Performance content search and ranking gossiping algorithm 4. Group extension 5. Related works and summary Outline

33 Related works  Tapestry, Pastry, Chord and CAN use distributed hash tables (DHT): key – value provide search mechanisms based on the key Problem: The high cost of publishing thousands of keys per file  Cori and Closs address the problems of database selection and ranking fusion on distributed collections use servers to keep a reduced index of the contents Problems: The need of centralized resources The possibility of a single point failure

34 Summary The first work that supports content ranking. Content addressable publish/ subscribe service for unstructured P2P communities Gossiping algorithm provides the propagation of shared information everywhere provides the robustness to the dynamic peer behavior Content search and ranking algorithm provides the search capabilities comparable with centralized resources operates independently of how documents are distributed throughout the community

35 Questions ? PlanetP


Download ppt "PlanetP: Using Gossiping to Build Content Addressable Peer-to-Peer Information Sharing Communities F. M. Cuenca-Acuna, C. Peery, R. P. Martin, and T. D."

Similar presentations


Ads by Google