1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004.

Slides:



Advertisements
Similar presentations
Topology-Aware Overlay Construction and Server Selection Sylvia Ratnasamy Mark Handley Richard Karp Scott Shenker Infocom 2002.
Advertisements

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Peer to Peer and Distributed Hash Tables
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Scalable Content-Addressable Network Lintao Liu
Kademlia: A Peer-to-peer Information System Based on the XOR Metric.
Topologically-Aware Overlay Construction and Server Selection Sylvia Ratnasamy, Mark Handly, Richard Karp and Scott Shenker Presented by Shreeram Sahasrabudhe.
CHORD – peer to peer lookup protocol Shankar Karthik Vaithianathan & Aravind Sivaraman University of Central Florida.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Robert Morris Ion Stoica, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT.
Massively Distributed Database Systems Distributed Hash Spring 2014 Ki-Joune Li Pusan National University.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Schenker Presented by Greg Nims.
Common approach 1. Define space: assign random ID (160-bit) to each node and key 2. Define a metric topology in this space,  that is, the space of keys.
Presented by Elisavet Kozyri. A distributed application architecture that partitions tasks or work loads between peers Main actions: Find the owner of.
A Scalable Content Addressable Network (CAN)
Peer to Peer File Sharing Huseyin Ozgur TAN. What is Peer-to-Peer?  Every node is designed to(but may not by user choice) provide some service that helps.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network (CAN) ACIRI U.C.Berkeley Tahoe Networks.
Topics in Reliable Distributed Systems Lecture 2, Fall Dr. Idit Keidar.
A Scalable Content Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, and Scott Shenker Presented by: Ilya Mirsky, Alex.
Introduction to Peer-to-Peer (P2P) Systems Gabi Kliot - Computer Science Department, Technion Concurrent and Distributed Computing Course 28/06/2006 The.
Looking Up Data in P2P Systems Hari Balakrishnan M.Frans Kaashoek David Karger Robert Morris Ion Stoica.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
Distributed Lookup Systems
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
1 A Scalable Content- Addressable Network S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker Proceedings of ACM SIGCOMM ’01 Sections: 3.5 & 3.7.
Content Addressable Networks. CAN Associate with each node and item a unique id in a d-dimensional space Goals –Scales to hundreds of thousands of nodes.
Chord-over-Chord Overlay Sudhindra Rao Ph.D Qualifier Exam Department of ECECS.
Topics in Reliable Distributed Systems Fall Dr. Idit Keidar.
1 CS 194: Distributed Systems Distributed Hash Tables Scott Shenker and Ion Stoica Computer Science Division Department of Electrical Engineering and Computer.
Decentralized Location Services CS273 Guest Lecture April 24, 2001 Ben Y. Zhao.
Or, Providing Scalable, Decentralized Location and Routing Network Services Tapestry: Fault-tolerant Wide-area Application Infrastructure Motivation and.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
A Scalable Content-Addressable Network
Topology-Aware Overlay Networks By Huseyin Ozgur TAN.
Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker A Scalable, Content- Addressable Network ACIRI U.C.Berkeley Tahoe Networks 1.
Structured P2P Network Group14: Qiwei Zhang; Shi Yan; Dawei Ouyang; Boyu Sun.
1 A scalable Content- Addressable Network Sylvia Rathnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker Pirammanayagam Manickavasagam.
Roger ZimmermannCOMPSAC 2004, September 30 Spatial Data Query Support in Peer-to-Peer Systems Roger Zimmermann, Wei-Shinn Ku, and Haojun Wang Computer.
CONTENT ADDRESSABLE NETWORK Sylvia Ratsanamy, Mark Handley Paul Francis, Richard Karp Scott Shenker.
Applied Research Laboratory David E. Taylor A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker.
Sylvia Ratnasamy (UC Berkley Dissertation 2002) Paul Francis Mark Handley Richard Karp Scott Shenker A Scalable, Content Addressable Network Slides by.
1 Peer-to-Peer Systems. 2 Introduction What is peer One that of equal standing with another Peer-to-peer A way of structure distributed applications Each.
Vincent Matossian September 21st 2001 ECE 579 An Overview of Decentralized Discovery mechanisms.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Content Addressable Network CAN. The CAN is essentially a distributed Internet-scale hash table that maps file names to their location in the network.
A Scalable Content-Addressable Network (CAN) Seminar “Peer-to-peer Information Systems” Speaker Vladimir Eske Advisor Dr. Ralf Schenkel November 2003.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
Scalable Content- Addressable Networks Prepared by Kuhan Paramsothy March 5, 2007.
Peer to Peer A Survey and comparison of peer-to-peer overlay network schemes And so on… Chulhyun Park
P2P Group Meeting (ICS/FORTH) Monday, 28 March, 2005 A Scalable Content-Addressable Network Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp,
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
1 Distributed Hash Table CS780-3 Lecture Notes In courtesy of Heng Yin.
Topologically-Aware Overlay Construction and Sever Selection Sylvia Ratnasamy, Mark Handley, Richard Karp, Scott Shenker.
Algorithms and Techniques in Structured Scalable Peer-to-Peer Networks
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Two Peer-to-Peer Networking Approaches Ken Calvert Net Seminar, 23 October 2001 Note: Many slides “borrowed” from S. Ratnasamy’s Qualifying Exam talk.
P2P Search COP6731 Advanced Database Systems. P2P Computing  Powerful personal computer Share computing resources P2P Computing  Advantages: Shared.
P2P Search COP P2P Search Techniques Centralized P2P systems  e.g. Napster, Decentralized & unstructured P2P systems  e.g. Gnutella.
CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA
Plethora: A Locality Enhancing Peer-to-Peer Network Ronaldo Alves Ferreira Advisor: Ananth Grama Co-advisor: Suresh Jagannathan Department of Computer.
CS694 - DHT1 Distributed Hash Table Systems Hui Zhang University of Southern California.
Brocade: Landmark Routing on Overlay Networks
A Scalable content-addressable network
CONTENT ADDRESSABLE NETWORK
A Scalable, Content-Addressable Network
A Scalable Content Addressable Network
A Scalable, Content-Addressable Network
MIT LCS Proceedings of the 2001 ACM SIGCOMM Conference
Presentation transcript:

1 Distributed Hash Tables and Structured P2P Systems Ningfang Mi September 27, 2004

2 Outline The Lookup problem  Definition  Approaches  Issues Case study: CAN  Overview  Basic design  More design improvements (Performance, robustness)  Topology-aware routing in CAN Conclusion

3 The Lookup Problem -- Definition Given a data item X stored at some dynamic set of nodes, find it It is an important and critical common problem in P2P. How to efficiently find it?

4 The Lookup Problem -- Approaches Central database: Napster  Central point of failure Use hierarchy: DNS for name lookup  Failure or removal of the root or nodes high in hierarchy  the root or nodes high in hierarchy overload Symmetric lookup algorithm: Gnutella  No more important node  “Flooding” request  not scale well “Superpeers” in a hierarchical structure: KaZaA  Failure of “Superpeers” Innovative symmetric lookup strategy: Freenet  Need visit large number of nodes and no guarantee

5 The Lookup Problem -- Approaches Structured and symmetric algorithms:  Implement the DHT (distributed hash table) Convert name to key using a hash function, SHA-1 One operation: Lookup(key)  Examples: CAN[2] -- ACM SIGCOMM 2001 Chord[3] -- ACM SIGCOMM 2001 Pastry[5] -- Middleware 2001 Tapestry[6] – UC Berkeley 2001

6 The Lookup Problem -- Issues Mapping keys to nodes in a load balanced way Forwarding a lookup for a key to an appropriate node Distance function Building routing tables adaptively Discuss these issues by a case study -- CAN

7 “What is a scalable content-addressable network”??? --- CANs Design Overview Features:  Completely distributed system No centralized control, coordination and configuration  Scalable Maintain only a small number of control states Independent of the number of nodes  content-addressable Resembling a hash table: keys  values  Fault-tolerant Can route around failures

8 “How does a scalable CAN work” ??? --- CANs Design Overview Using a virtual d-dimensional Cartesian coordinate space  Zone: hyper-rectangles CAN is composed of individual nodes  Each node “ owns ” its individual, distinct zone  Each node “ holds ” state information about its neighbor zones Three operations  insertion, lookup and deletion  (key, value) Requests for a key are routed by using a greedy routing algorithm

9 2-d coordinate space with 5 nodes p1 (k1,v1) p2 (k2,v2) Map key k1 onto a point p1 using a uniform hash table Store (k1,v1) at the node that owns the zone with p1 To retrieve (k1,v1), apply the same hash function to map k1 onto point p1 and get the value from p1

10 Basic design for a CAN CAN routing Construction of the CAN overlay Maintenance of the CAN overlay Three most basic pieces of the design.

11 “How can we find a point (x/y)?” --- Routing in a CAN Maintain a routing table of IP address and virtual coordinate zone of its neighbors Greedy forwarding CAN message to neighbor closest to destination For d-dimensional space partitioned into n equal zones,  Each node maintains 2d neighbors  Average routing path length: (d/4)(n 1/d ) A BC D E F G D’s neighbor set = {C,E} C’s neighbor set = {B,D,E,F} B’s neighbor set = {A,C,G} (0.2/0.4)

12 “I want to join!” --- CAN Construction Find a node already in the CAN  Retrieve a bootstrap node ’ s IP address by looking up the CAN domain name in DNS  Bootstrap node provides IP addresses of random nodes in CAN Finding a zone in the space  Randomly choose a point in the space  Send “ JOIN ” request to P via an exiting CAN node A  Node B in zone with P splits the zone and allocates “ half ” with (key, value) pairs to new node Joining the routing by notifying the neighbors  Learn IP addresses of its neighbors set from previous occupant  Previous occupant updates its neighbor set  New and old neighbors be informed of this reallocation.

13 A’s neighbor set = {G,H} B’s neighbor set = {C,G,H} G’s neighbor set = {A,B,F,H} H’s neighbor set = {A,B,G} 0.0 A H BC D E F G 1.0 A BC D E F G A’s neighbor set = {B,G} B’s neighbor set = {A,C,G} G’s neighbor set = {A,B,F} H JOIN (0.2/0.1) (0.2/0.1)

14 “I want to leave!” --- CAN Maintenance Zone must be taken over by some node Normal procedure  Explicitly hand over zone and (key, value) pairs to one neighbor  Merge to valid zone if possible, otherwise the neighbor with smallest zone handle both zones temporarily Failure: immediate takeover algorithm  Neighbors initialize takeover timers  Alive Neighbor with smallest zone takes over zone

H BC D E F G 1.0 A Senario1: H leaves explicitly Senario2: C leaves explicitly Senario3: G dies A A

16 “Can we do Better???” --- Design Improvements (1) Basic design  Balance low per-node state (O(d)) and short path lengths (O(dn 1/d ))  Application level hops, not IP level hops  Neighbor nodes may be geographically distant with many IP hops Average total latency of a look up == Avg(# of CAN hops) X Avg(latency of each CAN hop)

17 “Can we do Better???” --- Design Improvements (2) Primary goal:  Reduce the latency of CAN routing Path length Per-CAN-hop latency Other achievements  Robustness– routing & data availability  Load balancing Simulated CAN design on Transit-Stub (TS) topologies using the GT-ITM topology generator

18 Multi-Dimensioned Coordinate Spaces (1) Dimensions  Path length (latency)  Size of the coordinate routing table (small)  Fault tolerance Effect of dimensions on path length

19 Realities: Multiple Coordinate Spaces (2) Maintain multiple independent coordinate spaces (realities) For a CAN with r realities:  Each node is assigned r zones and holds r independent neighbor sets  Contents of the hash table are replicated for each reality 0.0 BC D E F G 1.0 A r=2 (0.6/0.8) (k1,V1) 0.0 C DE F G 1.0 A (0.6/0.8) (k1,V1) B

20 Realities: Multiple Coordinate Spaces (2) Realities  Data availability  Faulty tolerance  Path length (latency) Effect of multiple realities on path length

21 Dimensions vs. Realities More dimensions has greater effect on path length More realities provides stronger fault-tolerance and increased data availability Path length with increasing neighbor state

22 Better CAN Routing Metrics (3) (Basic CAN routing metric:  Cartesian distance in virtual coordinate space) RTT-weighted routing (round-trip-time)  Reflect underlying IP topology  Each node measures RTT to each neighbor  Forward message to neighbor with maximum ratio of progress to RTT  Reduce the latency of individual hops

23 Overloading Coordinate Zones (4) Multiple nodes share the same zone, with a threshold MAXPEERS Nodes know all peers in its zone and one node in its neighbor zones How to achieve overloading zones? AB join < = ? MAXPEERS A join B’s zone Replicate key-value pairs Split B’s zone Divide B’s peer list Divide key-value pairs

24 Overloading Coordinate Zones (4) Advantage:  Reduce path length (number of hops)  Reduce per-hop latency  Improve fault tolerance Disadvantage:  Add somewhat to system complexity

25 Multiple Hash Functions (5) Use k different hash functions to map a key onto k points and replicate (key, value) at k distinct nodes Data availability Average query latency (query in parallel) The size of (key,value) database and query traffic by a factor of k

26 Topologically-Sensitive Construction of the CAN Overlay Network (6) Problem of randomly allocating nodes to zones  Strange routing scenario  inefficient routing Goal: construct CAN overlay that are congruent with the underlying IP topology Landmarks: a well-known set of machines like DNS servers Each node measures its RTT to each landmark  Order each landmark in order of increasing RTT  For m landmarks: m! possible orderings Partition coordinate space into m! equal size partitions Nodes join CAN at random point in the partition corresponding to its landmark ordering

27 H 1.0 l1l2l3l1l2l3 l1l3l2l1l3l2 l2l1l3l2l1l3 l2l3l1l2l3l1 l3l1l2l3l1l2 l3l2l1l3l2l1 m=3, m!=6 partitions H (l 2 l 3 l 1 ) C DG F E A B (0.8/0.6) F

28 Topologically-Sensitive Construction of the CAN Overlay Network (6) Evaluating metric: Latency stretch= The coordinate space is no longer uniformly populated  Background load balancing techniques the latency on the CAN network the average latency on the IP network

29 Load Balancing Techniques (7) More uniform partitioning-- v olume balancing  When “ JOIN ” is received, compare its own zone volume with neighbors ’ zone volumes  Split zone with largest volume  Results: V=V T /n, V T is the total volume Catching and replication for “ hot spot ”  Catch the data keys it recently accessed  Replicate the “ hot ” data key at its neighboring nodes Type% of nodes with volume VLargest Volume without40%8V with90%4V

30 Design Review n=2 18 nodes d : dimensions r : number of realities p : number of peers per zone k : number of hash functions

31 Topology-Aware Routing in CAN “It is critical for overlay routing to be aware of the network topology” [6]

32 A BC D B A D C A D C B

33 Approaches to Topology-Aware Routing in CAN Proximity routing Topology-based nodeId assignment Proximity neighbor selection  Doesn’t work on CAN

34 Proximity Routing The overlay is constructed without regard for the physical network topology Among a set of possible next hops, select the one that is  Closest in the physical network  Represents a good compromise between progress in the id space and proximity Example  Better CAN routing metrics --- RTT

35 Topology-based NodeId Assignment Map the overlay’s logical id space onto the physical network Example1: Landmark binning [5][7]  Destroy the uniform population of id space  Landmark sites can become overloaded.  Coarse-grained and difficult to distinguish relatively close nodes Example2: SAT-match [8]

36 Self-Adaptive Topology (SAT) Matching Two phases:  The probing phase: probe nearby nodes for distance measurements as soon as join the system  The jumping phase: pick the closest zone accordingly to jump to The iterative process completes until it is close enough to the zone where all its physically close neighbors are located A global topology matching optimization is achieved.

37 SAT-Matching -- Probing Phase (1) Effective flooding: flooding with a low number of TTL hops is highly effective, with produce few redundant messages. Having joined the system based on a DHT assignment, source node floods a message to its neighbors  (source IP address, source timestamp, small TTL k) Node that receives message responds to the source with its IP address and flood to its neighbors with k-1 if k-1>0

38 SAT-Matching -- Probing Phase (2) Example  TTL-2 neighborhood of source node Node 2’s TTL-2 neighborhood Node 13’s TTL-2 neighborhood

39 SAT-Matching -- Probing Phase (3) After collecting a list of IP address, source node uses a ping facility to measure the RTTs (Round-Trip-Times) to each node that has responded. Sort these RTTs and select two nodes with the smallest RTTs. Select one zone associated with one of the two nodes to jump in.

40 SAT-Matching -- Jumping Phase If simply jumping to the zone, the stretch reduction could be offset by latency increases from other new connection. Jumping criteria:  only when the local stretch of the source’s and the sink’s (two closest nodes) TTL-1 neighborhoods is reduced How to jump? (X  Y)  X return its zone and (key, value) pairs to its neighbor  Y allocate half of its zone and pairs to X

41 Discussion Is there way to make structured P2P more practical?  It is not free for peers. In CAN, peers are required to follow the protocol and be responsible for some data keys.  Only key search Can the topology-aware routing be a cost- effective manner in a highly dynamic environment?

42 Conclusion CAN, Chord, Pastry and Tapestry are representative Structured P2P system  Self-organizing  Scalable  Distributed DHT  Fault tolerance Structured P2P Size of routing table number of hops CANO(d)O(n 1/d ) ChordO(log n) PastryO(log n) TapestryO(log n)

43 Reference [1]Hari Balakrishnan, et., al., “Looking Up Data in P2P Systems”, Communication of the ACM, Vol. 46, N. 2, [2] Sylvia Ratnasamy, Paul Francis, Mark Handley, and Richard Karp, “A Scalable Content-Addressable Network”, ACM SIGCOMM [3] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, and Hari Balakrishnan, “Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications”, ACM SIGCOMM [4]Antony Rowstron and Peter Druschel, “Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems”, Middleware [5] B. Zhao and J. Kubiatowicz and A. Joseph, “Tapestry: An infrastructure for fault- tolerant wide-area location and routing”, U. C. Berkeley, 2001 [6] M. Castro, P. Druschel, Y. C. Hu, and A. Rowstron, "Topologyaware routing in structured peer-to-peer overlay networks“, presented at Intl. Workshop on Future Directions in Distributed Computing, June [7] S. Ratnasamy, M. Handley, R. Karp, and S. Shenker, “Topologically-aware overlay construction and server selection”, INFOCOM [8] Shansi Ren, Lei Guo, Song Jiang, and Xiaodong Zhang, “SAT-Match: a self- adaptive topology matching method to achieve low lookup latency in structured P2P overlay networks", Proceedings of the 18th International Parallel and Distributed Processing Symposium (IPDPS'04), April 26-30, 2004.