Presentation is loading. Please wait.

Presentation is loading. Please wait.

Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002.

Similar presentations

Presentation on theme: "Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002."— Presentation transcript:

1 Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002

2 Outline Brief survey of P2P architectures Evaluation Methodology Search Methods Replication Conclusions

3 Peer-to-Peer Networks Peers are connected by an overlay network. Users cooperate to share files (e.g., music, videos, etc.) Dynamic: nodes join or leave frequently

4 P2P Network Architectures I Centralized: –Use of central directory server (CDS) –Peers query to the CSD to find other peers that hold the desired object Pros: very efficient Cons: poorly scales single point of failure

5 P2P Network Architectures II Decentralized: No central directory server –But structured: P2P network topology is tightly controlled Files are placed at specified locations –Unstructured: No control in Network topology or file placement

6 P2P Network Architectures III Decentralized but Structured “loose structured” –Placement of files is based on hints “tight structure” –Precisely declare structure of P2P network and file placement –Use of distributed hash table Pros: Efficient satisfaction of queries Good scaling Cons: No proof it works

7 P2P Network Architectures IV Decentralized and Unstructured Placement of files not based on topology knowledge Finding files –Node queries neighbors (usually using flooding) Pros: extremely resilient to network changes Cons: extremely unscalable generates large loads

8 Evaluation Methodology I Terminology Network Topology: instant graph formed by nodes in the network Query Distribution: frequency of lookups to files Replication Distribution: percentage of nodes that have a particular file

9 Evaluation Methodology II Network Topologies –Powel-Law Random Graph (PLRG) Max node degree: 1746, median: 1 average 4.46 –Normal Random Graph (Random) Average and median node degree is 4 –Gnutella graph (Gnutella) Oct 2000 snapshot Max degree: 136, median: 2, average: 5.5 –Two-dimensional Grid 100x100  10000 nodes

10 Evaluation Methodology III Object query distribution q i –Uniform –Zipf-like Object replication density distribution r i –Uniform –Proportional: r i  q i –Square-Root: r i   q i

11 Evaluation Methodology IV Metrics –User aspects Pr(success) #hops –Load aspects Average #messages per node #nodes visited Peak #messages

12 Limitation of Flooding I Gnutella uses TTL to check #hops queries travel Problem: –Hard to choose TTL: For objects that are widely present in the network, small TTLs suffice For objects that are rare in the network, large TTLs are necessary –Number of query messages grow exponentially as TTL grows

13 Limitation of Flooding II Node may receive the same messages more than once Need for duplication detection mechanisms Still duplication increases as TTL increases in flooding

14 Limitation of Flooding Conclusion Flooding increases per-node overhead Need for more scalable search methods: –Expanding Ring –Random Walks

15 Expanding Ring Adaptively Adjust TTL –Multiple floods: start with TTL=1; increment TTL by 2 each time until search succeeds Still have duplicate messages

16 Random Walk Simple random walk –Takes too long to find anything Multiple-walker random walk –K walkers after each walking T steps visits as many nodes as 1 walker walking K*T steps – More messages  more overhead –When to terminate the search: TTL Checking: check back with query originator once every C steps

17 Search Traffic Comparison

18 Search Delay Comparison

19 Lessons Learned about Search Methods Key: Cover the right number of nodes as quickly as possible and with as little overhead as possible Pay Attention to –Adaptive termination –Minimize message duplication –Small expansion in each step

20 Replication In unstructured P2P systems, search success is essentially about coverage: visiting enough nodes to find the object => replication density matters Goal: minimize average search size (number of probes till query is satisfied) Theoretical Optimal: copy everything everywhere –Limited node storage

21 Replication Strategies Uniform Replication –pi = 1/m –Simple, resources are divided equally Proportional Replication –pi = qi –“Fair”, resources per item proportional to demand – Reflects current P2P practices

22 Square-Root Replication p i is proportional to square-root(q i ) Lies “In-between” Uniform and Proportional

23 Achieving Square-Root Replication I Assuming that each query keeps track the number of probes needed Store an object at a number of nodes that is proportional to the number of probes Two implementations: –Path replication: store the object along the path of a successful “walk” –Random replication: store the object randomly among nodes visited by the agents

24 Achieving Square-Root Replication II

25 Evaluation of Replication Methods I Metrics –Overall message traffic –Search delay Dynamic simulation –Assume Zipf-like object query probability –5 query/sec Poisson arrival –Results are during 5000sec-9000sec –Search method: 32-walkers random walk with state keeping and check every 4 steps

26 Evaluation of Replication Methods II Square-Root Replication reduces search traffic

27 Evaluation of Replication Methods III

28 Conclusions Multi-walker random walk scales much better than flooding –Can find data more quickly –Reduces the traffic overload Square-root replication distribution is desirable –Minimizes search delay –Minimizes the overall search traffic

Download ppt "Search and Replication in Unstructured Peer-to-Peer Networks Pei Cao, Christine Lv., Edith Cohen, Kai Li and Scott Shenker ICS 2002."

Similar presentations

Ads by Google