1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.

1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004

2 Outline The Lookup problem  Definition  Approaches  Issues Case study: CAN  Overview  Basic design  More design improvements (Performance, robustness)  Topology-aware routing in CAN Conclusion

3 The Lookup Problem -- Definition Given a data item X stored at some dynamic set of nodes, find it It is an important and critical common problem in P2P. How to efficiently find it?

4 The Lookup Problem -- Approaches Central database: Napster  Central point of failure Use hierarchy: DNS for name lookup  Failure or removal of the root or nodes high in hierarchy  the root or nodes high in hierarchy overload Symmetric lookup algorithm: Gnutella  No more important node  “Flooding” request  not scale well “Superpeers” in a hierarchical structure: KaZaA  Failure of “Superpeers” Innovative symmetric lookup strategy: Freenet  Need visit large number of nodes and no guarantee

5 The Lookup Problem -- Approaches Structured and symmetric algorithms:  Implement the DHT (distributed hash table) Convert name to key using a hash function, SHA-1 One operation: Lookup(key)  Examples: CAN[2] -- ACM SIGCOMM 2001 Chord[3] -- ACM SIGCOMM 2001 Pastry[5] -- Middleware 2001 Tapestry[6] – UC Berkeley 2001

6 The Lookup Problem -- Issues Mapping keys to nodes in a load balanced way Forwarding a lookup for a key to an appropriate node Distance function Building routing tables adaptively Discuss these issues by a case study -- CAN

7 “What is a scalable content-addressable network”??? --- CANs Design Overview Features:  Completely distributed system No centralized control, coordination and configuration  Scalable Maintain only a small number of control states Independent of the number of nodes  content-addressable Resembling a hash table: keys  values  Fault-tolerant Can route around failures

8 “How does a scalable CAN work” ??? --- CANs Design Overview Using a virtual d-dimensional Cartesian coordinate space  Zone: hyper-rectangles CAN is composed of individual nodes  Each node “ owns ” its individual, distinct zone  Each node “ holds ” state information about its neighbor zones Three operations  insertion, lookup and deletion  (key, value) Requests for a key are routed by using a greedy routing algorithm

9 2-d coordinate space with 5 nodes p1 (k1,v1) p2 (k2,v2) Map key k1 onto a point p1 using a uniform hash table Store (k1,v1) at the node that owns the zone with p1 To retrieve (k1,v1), apply the same hash function to map k1 onto point p1 and get the value from p1

10 Basic design for a CAN CAN routing Construction of the CAN overlay Maintenance of the CAN overlay Three most basic pieces of the design.

11 “How can we find a point (x/y)?” --- Routing in a CAN Maintain a routing table of IP address and virtual coordinate zone of its neighbors Greedy forwarding CAN message to neighbor closest to destination For d-dimensional space partitioned into n equal zones,  Each node maintains 2d neighbors  Average routing path length: (d/4)(n 1/d ) A BC D E F G D’s neighbor set = {C,E} C’s neighbor set = {B,D,E,F} B’s neighbor set = {A,C,G} 0.0 1.0 0.0 (0.2/0.4)

12 “I want to join!” --- CAN Construction Find a node already in the CAN  Retrieve a bootstrap node ’ s IP address by looking up the CAN domain name in DNS  Bootstrap node provides IP addresses of random nodes in CAN Finding a zone in the space  Randomly choose a point in the space  Send “ JOIN ” request to P via an exiting CAN node A  Node B in zone with P splits the zone and allocates “ half ” with (key, value) pairs to new node Joining the routing by notifying the neighbors  Learn IP addresses of its neighbors set from previous occupant  Previous occupant updates its neighbor set  New and old neighbors be informed of this reallocation.

13 A’s neighbor set = {G,H} B’s neighbor set = {C,G,H} G’s neighbor set = {A,B,F,H} H’s neighbor set = {A,B,G} 0.0 A H BC D E F G 1.0 A BC D E F G A’s neighbor set = {B,G} B’s neighbor set = {A,C,G} G’s neighbor set = {A,B,F} 0.0 1.0 0.0 H JOIN (0.2/0.1) (0.2/0.1)

14 “I want to leave!” --- CAN Maintenance Zone must be taken over by some node Normal procedure  Explicitly hand over zone and (key, value) pairs to one neighbor  Merge to valid zone if possible, otherwise the neighbor with smallest zone handle both zones temporarily Failure: immediate takeover algorithm  Neighbors initialize takeover timers  Alive Neighbor with smallest zone takes over zone

15 0.0 H BC D E F G 1.0 A Senario1: H leaves explicitly Senario2: C leaves explicitly Senario3: G dies A A

16 “Can we do Better???” --- Design Improvements (1) Basic design  Balance low per-node state (O(d)) and short path lengths (O(dn 1/d ))  Application level hops, not IP level hops  Neighbor nodes may be geographically distant with many IP hops Average total latency of a look up == Avg(# of CAN hops) X Avg(latency of each CAN hop)

17 “Can we do Better???” --- Design Improvements (2) Primary goal:  Reduce the latency of CAN routing Path length Per-CAN-hop latency Other achievements  Robustness– routing & data availability  Load balancing Simulated CAN design on Transit-Stub (TS) topologies using the GT-ITM topology generator

18 Multi-Dimensioned Coordinate Spaces (1) Dimensions  Path length (latency)  Size of the coordinate routing table (small)  Fault tolerance Effect of dimensions on path length

19 Realities: Multiple Coordinate Spaces (2) Maintain multiple independent coordinate spaces (realities) For a CAN with r realities:  Each node is assigned r zones and holds r independent neighbor sets  Contents of the hash table are replicated for each reality 0.0 BC D E F G 1.0 A r=2 (0.6/0.8) (k1,V1) 0.0 C DE F G 1.0 A (0.6/0.8) (k1,V1) B

20 Realities: Multiple Coordinate Spaces (2) Realities  Data availability  Faulty tolerance  Path length (latency) Effect of multiple realities on path length

21 Dimensions vs. Realities More dimensions has greater effect on path length More realities provides stronger fault-tolerance and increased data availability Path length with increasing neighbor state

22 Better CAN Routing Metrics (3) (Basic CAN routing metric:  Cartesian distance in virtual coordinate space) RTT-weighted routing (round-trip-time)  Reflect underlying IP topology  Each node measures RTT to each neighbor  Forward message to neighbor with maximum ratio of progress to RTT  Reduce the latency of individual hops

23 Overloading Coordinate Zones (4) Multiple nodes share the same zone, with a threshold MAXPEERS Nodes know all peers in its zone and one node in its neighbor zones How to achieve overloading zones? AB join < = ? MAXPEERS A join B’s zone Replicate key-value pairs Split B’s zone Divide B’s peer list Divide key-value pairs

24 Overloading Coordinate Zones (4) Advantage:  Reduce path length (number of hops)  Reduce per-hop latency  Improve fault tolerance Disadvantage:  Add somewhat to system complexity

25 Multiple Hash Functions (5) Use k different hash functions to map a key onto k points and replicate (key, value) at k distinct nodes Data availability Average query latency (query in parallel) The size of (key,value) database and query traffic by a factor of k

26 Topologically-Sensitive Construction of the CAN Overlay Network (6) Problem of randomly allocating nodes to zones  Strange routing scenario  inefficient routing Goal: construct CAN overlay that are congruent with the underlying IP topology Landmarks: a well-known set of machines like DNS servers Each node measures its RTT to each landmark  Order each landmark in order of increasing RTT  For m landmarks: m! possible orderings Partition coordinate space into m! equal size partitions Nodes join CAN at random point in the partition corresponding to its landmark ordering

27 H 1.0 l1l2l3l1l2l3 l1l3l2l1l3l2 l2l1l3l2l1l3 l2l3l1l2l3l1 l3l1l2l3l1l2 l3l2l1l3l2l1 m=3, m!=6 partitions H (l 2 l 3 l 1 ) 0.0 1.0 0.0 C DG F E A B (0.8/0.6) F

28 Topologically-Sensitive Construction of the CAN Overlay Network (6) Evaluating metric: Latency stretch= The coordinate space is no longer uniformly populated  Background load balancing techniques the latency on the CAN network the average latency on the IP network

29 Load Balancing Techniques (7) More uniform partitioning-- v olume balancing  When “ JOIN ” is received, compare its own zone volume with neighbors ’ zone volumes  Split zone with largest volume  Results: V=V T /n, V T is the total volume Catching and replication for “ hot spot ”  Catch the data keys it recently accessed  Replicate the “ hot ” data key at its neighboring nodes Type% of nodes with volume VLargest Volume without40%8V with90%4V

30 Design Review n=2 18 nodes d : dimensions r : number of realities p : number of peers per zone k : number of hash functions

31 Topology-Aware Routing in CAN “It is critical for overlay routing to be aware of the network topology” [6]

32 A BC D B A D C A D C B

33 Approaches to Topology-Aware Routing in CAN Proximity routing Topology-based nodeId assignment Proximity neighbor selection  Doesn’t work on CAN

34 Proximity Routing The overlay is constructed without regard for the physical network topology Among a set of possible next hops, select the one that is  Closest in the physical network  Represents a good compromise between progress in the id space and proximity Example  Better CAN routing metrics --- RTT

35 Topology-based NodeId Assignment Map the overlay’s logical id space onto the physical network Example1: Landmark binning [5][7]  Destroy the uniform population of id space  Landmark sites can become overloaded.  Coarse-grained and difficult to distinguish relatively close nodes Example2: SAT-match [8]

36 Self-Adaptive Topology (SAT) Matching Two phases:  The probing phase: probe nearby nodes for distance measurements as soon as join the system  The jumping phase: pick the closest zone accordingly to jump to The iterative process completes until it is close enough to the zone where all its physically close neighbors are located A global topology matching optimization is achieved.

37 SAT-Matching -- Probing Phase (1) Effective flooding: flooding with a low number of TTL hops is highly effective, with produce few redundant messages. Having joined the system based on a DHT assignment, source node floods a message to its neighbors  (source IP address, source timestamp, small TTL k) Node that receives message responds to the source with its IP address and flood to its neighbors with k-1 if k-1>0

38 SAT-Matching -- Probing Phase (2) Example  TTL-2 neighborhood of source node 1 2 3 4 56 7 8 9 1011 1213 14 Node 2’s TTL-2 neighborhood Node 13’s TTL-2 neighborhood

39 SAT-Matching -- Probing Phase (3) After collecting a list of IP address, source node uses a ping facility to measure the RTTs (Round-Trip-Times) to each node that has responded. Sort these RTTs and select two nodes with the smallest RTTs. Select one zone associated with one of the two nodes to jump in.

40 SAT-Matching -- Jumping Phase If simply jumping to the zone, the stretch reduction could be offset by latency increases from other new connection. Jumping criteria:  only when the local stretch of the source’s and the sink’s (two closest nodes) TTL-1 neighborhoods is reduced How to jump? (X  Y)  X return its zone and (key, value) pairs to its neighbor  Y allocate half of its zone and pairs to X

41 Discussion Is there way to make structured P2P more practical?  It is not free for peers. In CAN, peers are required to follow the protocol and be responsible for some data keys.  Only key search Can the topology-aware routing be a cost- effective manner in a highly dynamic environment?

42 Conclusion CAN, Chord, Pastry and Tapestry are representative Structured P2P system  Self-organizing  Scalable  Distributed DHT  Fault tolerance Structured P2P Size of routing table number of hops CANO(d)O(n 1/d ) ChordO(log n) PastryO(log n) TapestryO(log n)

43 Reference [1]Hari Balakrishnan, et., al., “Looking Up Data in P2P Systems”, Communication of the ACM, Vol. 46, N. 2, 2003. [2] Sylvia Ratnasamy, Paul Francis, Mark Handley, and Richard Karp, “A Scalable Content-Addressable Network”, ACM SIGCOMM 2001. [3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”, ACM SIGCOMM 2001. [4]Antony Rowstron and Peter Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems”, Middleware 2001. [5] B. Zhao and J. Kubiatowicz and A. Joseph, “Tapestry: An infrastructure for fault- tolerant wide-area location and routing”, U. C. Berkeley, 2001 [6] M. Castro, P. Druschel, Y. C. Hu, and A. Rowstron, "Topologyaware routing in structured peer-to-peer overlay networks“, presented at Intl. Workshop on Future Directions in Distributed Computing, June 2002. [7] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Topologically-aware overlay construction and server selection”, INFOCOM 2002. [8] Shansi Ren, Lei Guo, Song Jiang, and Xiaodong Zhang, “SAT-Match: a self- adaptive topology matching method to achieve low lookup latency in structured P2P overlay networks", Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04), April 26-30, 2004.

1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.

Similar presentations

Presentation on theme: "1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.

Similar presentations

Presentation on theme: "1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004."— Presentation transcript:

Similar presentations

About project

Feedback