Presentation is loading. Please wait.

Presentation is loading. Please wait.

Improving Search in P2P Networks By Shadi Lahham.

Similar presentations


Presentation on theme: "Improving Search in P2P Networks By Shadi Lahham."— Presentation transcript:

1 Improving Search in P2P Networks By Shadi Lahham

2 Improving P2P Search2 Purpose of This Lecture General understanding of P2P systems Appreciating the need for efficient search Applying different search techniques to different scenarios

3 Improving P2P Search3 Table Of Contents P2P Basics –What Is P2P –Advantages of P2P –Types of P2P Systems –Shortcomings Search Methods –The Search Problem –Current Methods –Suggested Methods Experimental Setup –Metrics –Data Collection –Calculating Costs Analysis of Results Conclusions

4 Introduction P2P Basics

5 Improving P2P Search5 What is P2P Distributed system Peers (nodes) are servers and clients simultaneously Peers are of equal roles Resources shared across peers No central server needed Examples of P2P system

6 Improving P2P Search6 P2P Overview file3f3 file2f2 file1f1 FileKey

7 Improving P2P Search7 Advantages of P2P P2P vs. Centralized Servers –Distributes disk space / bandwidth –Inexpensively scalable –Self organized (autonomous) –Load balancing –Adaptative / fault tolerant –Less susceptible to attacks –Allows for redundancy

8 Improving P2P Search8 Types of P2P Systems Hybrid ( napster ) Pure ( gnutella ) Super Peers ( kaZaA )

9 Improving P2P Search9 Hybrid ( napster )

10 Improving P2P Search10 Pure ( gnutella )

11 Improving P2P Search11 Super Peers ( kaZaA ) Make use of heterogeneity –Powerful peers serve as super peers –Weaker peers act as clients Super-peers index clients’ files –Requires updates on join/leave/update Queries handled at super-peer level –Saves query costs

12 Improving P2P Search12 Super Peers ( kaZaA )

13 Improving P2P Search13 Hybrid - Shortcomings High cost on centralized index Performance & scalability bottleneck Needs maintenance Vulnerable ! Highly visible target

14 Improving P2P Search14 Pure - Shortcomings Inefficient search (flooding) Heterogeneity of peers not considered –Bottlenecks (limited peers) –Fragmentation

15 Improving P2P Search15 Super Peers - Shortcomings Super nodes might become bottlenecks for clients –requires redundancy Bad selection of supernodes might cause even worse problems

16 Search Methods

17 Improving P2P Search17 The Search Problem Connected graph Might contain cycles Individual node doesn’t know structure Only knows its neighbors No idea where data can be found

18 Improving P2P Search18 The Search Problem Goal : Find as many occurrences of the data using min time and resources Solution : –BFS ? –Bounded BFS ? –(naive approaches)

19 Improving P2P Search19 Bounded BFS Search TTL=2TTL=1TTL=0

20 Improving P2P Search20 Bounded BFS Search Messages get a global TTL (time to live) Algorithm –Source broadcasts a message to a subset of neighbors –Neighbors search locally. Results are sent to source if found –TTL = TTL – 1; –As long as TTL > 0 Nodes forward message to neighbors Downside : wastes bandwidth / processing

21 Improving P2P Search21 Current Methods Gnutella - BFS –High cost –Gets complete results ( for depth D) –Relatively short time Freenet - DFS –Poor response time –Minimizes BW costs

22 Improving P2P Search22 Suggested Methods Iterative deepening Directed BFS Local Indices

23 Improving P2P Search23 Iterative Deepening Idea: –Search at a small depth and increase if required –Aims to minimize the cost of BFS without detracting from it’s ability to satisfy queries Notice that given enough iterations this method returns %100 results of BFS

24 Improving P2P Search24 Iterative Deepening (cont…) Elements : –Policies P={a,b,c,..} define deepening behavior –BFS is run to depth a and frozen –If source is satisfied it stops the process –Otherwise it asks BFS to resume to depth b –Process is repeated until source satisfied or we reach the last policy item

25 Improving P2P Search25 Iterative Deepening (cont…) Elements : –We can specify how long to wait between iterations –We need a system-wide message ID to identify individual messages

26 Improving P2P Search26 Example P={1,3,4} W=1

27 Improving P2P Search27 Directed BFS Idea: –Choose a subset of neighbors to query –Neighbors will BFS as usual –Aims to provide a balance between good response time and results –Minimize costs of full BFS Notice that only a subset of possible results are returned so we might fail to satisfy query

28 Improving P2P Search28 Directed BFS Example TTL=2TTL=1TTL=0

29 Improving P2P Search29 Directed BFS (cont…) But which neighbors to pick ?? –Maintain simple statistics on neighbors to derive heuristics Highest past results Lowest average hops –(close to nodes containing useful data) High message count –(stable - can handle large flow) Shortest message queue –(long implies saturation) More to come …

30 Improving P2P Search30 Local Indices Idea: –Nodes hold metadata of all nodes at radius r –Can process query at a few nodes, but get same number of results –Aims to balance satisfaction / costs

31 Improving P2P Search31 Local Indices Elements: –Policies P={a,b,c,..} define the depths at which we search Example P={1,5,6} Nodes at depth 1 process the query Nodes at depth 2,3,4 forward without processing Policy ends at depth 6 –System-wide Radius r (small ~ 50K metadata )

32 Improving P2P Search32 Example P={1,4} Process Don’t process r = ?

33 Improving P2P Search33 Local Indices (cont…) –Notice that now there is an overhead –On Join Send join message of TTL = r Direct Exchange of metadata –On leave / timeout remove metadata of gone / dead nodes –On Update Send update message of TTL = r

34 Experimental Setup

35 Improving P2P Search35 Metrics How to compare methods ? 1.Costs 2.Results 3.Time

36 Improving P2P Search36 Metrics 1. Costs –We do not base cost on a specific query but rather calculate the average cost on Q rep, a representative set of real queries submitted –It makes sense to discuss costs in aggregate (i.e., over all the nodes in the network) –Therefore our two cost metrics are Average aggregate bandwidth Average aggregate processing cost

37 Improving P2P Search37 Metrics 2. Results Quality –Number of results –Satisfaction 3. Time to satisfaction

38 Improving P2P Search38 Data Collection Data gathered from Gnutella network Directly measured –Iterative deepening –Directed BFS Performance data & analysis –Local indices

39 Improving P2P Search39 Data Collection Number of hops Response time Results per message Source IP Etc … Collected Data

40 Improving P2P Search40 Data Collection SymbolDescription M(Q; n)# of response messages received for query Q, from n hops away R(Q; n)# of results received for query Q, from n hops away N(Q; n)# of nodes n hops away that process Q C(Q; n)# of redundant edges n hops away Extracted Data

41 Improving P2P Search41 Calculating Costs We’ve seen two types of costs –Bandwidth (BW) costs –Processing costs Calculations should take into account –Costs of sending a query –Costs of sending replies A example of calculating BW costs

42 Improving P2P Search42 Calculating Costs BW bfs (Q) = ∑ ( a(Q) · (N(Q,n) + C(Q,n)) D n=1 + n · ( c · R(Q,n) + d · M(Q,n) ) a(Q)Size of query QdSize of response message cSize of result recordDMax TTL

43 Analysis of Results Iterative Deepening

44 Improving P2P Search44 Symbols Used SymbolDefinition DMaximum time-to-live of a message, in terms of hops ZNumber of results needed to satisfy a query Q rep Representative set of queries for the Gnutella network WWaiting time (in seconds) between iterations NgNumber of neighbors of client (source node)

45 Improving P2P Search45 Results – Iterative Deepening Recall that iterative deepening policies P={a,b,c,..} define deepening behavior In order to have the same level of satisfaction as BFS a policy must have D as the last depth Also note the degenerate case policy {D} which is the bounded BFS we presenter earlier

46 Improving P2P Search46 Results – Iterative Deepening Variables –Define : P d = { d, d+1, …, D } P = { P d for d = 1,2,…,D } = { {1,2,…D}, {2,3,…D},…, {D-1,…D},{D} } W (waiting time) can take the values 1,2,4,6,150 (seconds)

47 Improving P2P Search47 Results – Iterative Deepening Fixed values Z = 50, Ng = 8 –Increasing Z Lower probability of satisfaction Higher costs More results –Decreasing Ng Slightly Lower probability of satisfaction Significantly Lower costs

48 Improving P2P Search48 Results – Iterative Deepening

49 Improving P2P Search49 Results – Iterative Deepening BW costs same for P 7 for all W’s As d increases costs increase. the larger d is the more likely the policy will “overshoot” As W decreases costs increase on a small W premature determination of un-satisfaction again leads to overshooting

50 Improving P2P Search50 Results – Iterative Deepening

51 Improving P2P Search51 Results – Iterative Deepening Time to satisfaction is inversely proportional to cost Choose a policy that balances average waiting time and cost For example {P 5 W=6}

52 Analysis of Results Directed BFS

53 Improving P2P Search53 Heuristics - Directed BFS SymbolHeuristic RAND(Random) >RESReturned the greatest number of results* <TIMEHad the shortest average time to satisfaction* <HOPSsmallest average number of hops taken by results* >MSGSent our client the greatest number of messages (all types) <QLENHad the shortest message queue <LATHad the shortest latency >DEGHad the highest degree (number of neighbors) *in the past 10 queries

54 Improving P2P Search54 Results – Directed BFS

55 Improving P2P Search55 Results – Directed BFS

56 Improving P2P Search56 Results – Directed BFS

57 Improving P2P Search57 Results – Directed BFS Costs in directed BFS unaffected by Z Users more aware of quality of results than BW costs –We recommend >RES <TIME –Still cheaper than full BFS (~65%) Sum up till now –Iterative deepening - lowest costs –Directed BFS – fastest time to satisfaction

58 Analysis of Results Local Indices

59 Improving P2P Search59 Results – Local Indices Recall that iterative deepening policies P={a,b,c,..} define the depths at which we search We choose policies that minimize the number of nodes that process the query

60 Improving P2P Search60 Results – Local Indices We consider the following policies

61 Improving P2P Search61 Results – Local Indices Also recall that joins / leaves / updates have a BW overhead QJR (QueryJoinRatio) gives us the ratio of queries to joins/leaves in the network

62 Improving P2P Search62 Results – Local Indices P 0 r=0

63 Improving P2P Search63 Results – Local Indices

64 Improving P2P Search64 Results – Local Indices 21MB 71 KB

65 Improving P2P Search65 Results – Local Indices Time to Satisfaction –Because most Query and Response messages have r fewer hops to travel, the time to forward messages to the outermost depth and back to the source will be shorter than for BFS –However, because nodes have larger indices, processing the query should take more time.

66 Improving P2P Search66 Results – Local Indices Summary –Huge savings in costs –Time to satisfaction comparable to BFS –Determining r must take QJR into consideration For current QJR values (e.g. Gnutella = 10) r =1 is a good choice

67 Improving P2P Search67 Relative performance Technique Time to satisfy Satisfaction Probability Number of results Aggregate Bandwidth Aggregate Processing Bounded BFS 100% Iterative deepening 190%100%19%28%47% Directed BFS 140%86%37%38%28% Local indices ≈100%100% 39%51%

68 Improving P2P Search68 Conclusions All 3 methods show significant bandwidth and processing savings Methods are simple and easy to implement in current systems Methods might be used in conjunction

69 Improving P2P Search69 Bibliography Yang, Beverly; Garcia-Molina, Hector : Improving Search in Peer-to-Peer Systems http://newdbpubs.stanford.edu:8090/pub/2002-28 Improving Search in Peer-to-Peer Systems [extended] http://newdbpubs.stanford.edu:8090/pub/2001-47 Designing a Super-peer Network http://newdbpubs.stanford.edu:8090/pub/2003-33 http://newdbpubs.stanford.edu:8090/pub/2003-33 Gnutella website http://www.gnutella.com/

70 Thank you


Download ppt "Improving Search in P2P Networks By Shadi Lahham."

Similar presentations


Ads by Google