SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications

How to find data in a distributed file sharing system? o Lookup is the key problem Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client ? Motivation

o Requires O(M) state o Single point of failure Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client DB o Central server (Napster) Centralized Solution

Centralized: Napster  Simple centralized scheme  How to find a file?  Use a centralized index database  Clients contact the index node to upload their list of content at initialization  To locate a file  Send a query to central index node  Get the list of peer locations storing the file  Fetch the file directly from one of the peers

o Worst case O(N) messages per lookup Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Flooding (Gnutella, Morpheus, etc.) Distributed Solution (1)

Flooding: Gnutella  Find a node to get onto the system  Connect to a few other nodes for resilience  How to find a file?  Broadcast request to all neighbors  On receiving a request, if don’t have the file  Re-broadcast to other neighbors  If have the file, pass the info to the requester  Transfers are done with HTTP between query originating node and the content owner node

Flooding: Gnutella  Advantages:  Totally decentralized, highly robust  Disadvantages:  Not scalable  Can flood all the network with requests (can use TTL scoping to prevent large scale flooding)

Flooding: FastTrack (aka Kazaa)  Modifies the Gnutella protocol into two-level hierarchy  Hybrid of Gnutella and Napster  Supernodes  Nodes that have better connection to Internet  Act as temporary indexing servers for other nodes  Help improve the stability of the network  Standard nodes  Connect to supernodes and report list of files  Allows slower nodes to participate  Search  Broadcast (Gnutella-style) search across supernodes  Disadvantages  Kept a centralized registration

o Routed messages (Freenet, Tapestry, Chord, CAN, etc.) Internet Publisher Key=“LetItBe” Value=MP3 data Lookup(“LetItBe”) N1N1 N2N2 N3N3 N5N5 N4N4 Client o Only exact matches Distributed Solution (2)

What is Chord? What does it do?  In short: a peer-to-peer lookup service  Solves problem of locating a data item in a collection of distributed nodes, considering frequent node arrivals and departures  Core operation in most p2p systems is efficient location of data items  Supports just one operation: given a key, it maps the key onto a node  Lookup(key)  IP address

Chord Property  Efficient: O(logN) messages per lookup  with “high probability”  N is the total number of servers  Scalable: O(logN) state per node  Robust: survives massive changes in membership  Fully decentralized  Self-organizing

o m bit identifier space for both keys and nodes o Key identifier = SHA-1(key) Key=“LetItBe” ID=54 SHA-1 IP=“198.10.10.1” ID=56 SHA-1 o Node identifier = SHA-1(IP address) o Both are uniformly distributed o How to map key IDs to node IDs? Chord IDs

Consistent Hashing [Karger 97] Circular 7-bit ID space IP=“198.10.10.1” Key=“LetItBe” o A key is stored at its successor: node with next higher ID

Consistent Hashing [Karger 97] Circular 7-bit ID space Where is “LetItBe”? lookup(54) o Every node knows of every other node o requires global information o Routing tables are large O(N) o Lookups are fast O(1)

Chord: Basic Lookup o Every node knows its successor in the ring o requires O(N) time Circular 7-bit ID space

Acceleration of Lookups  Lookups are accelerated by maintaining additional routing information  Each node maintains a routing table with (at most) m entries (where N=2 m ) called the finger table  i th entry in the table at node n contains the identity of the first node, s, that succeeds n by at least 2 i-1 on the identifier circle (clarification on next slide)  s = successor(n + 2 i-1 ) (all arithmetic mod 2)  s is called the i th finger of node n, denoted by s=n.finger(i)

Acceleration of Lookups  Every node knows m other nodes in the ring  Store them in a “finger table” with m entries or “fingers”  Entry i in finger table of node n points to: n.finger[i] = successor(n + 2 i-1 ) (i.e. the first node ≥ n + 2 i-1 )  n.finger[i] is reffered to as ith finger of n N80 N118 N96 N19 80 + 2 2 80 + 2 4 80 + 2 5 80 + 2 6 80 + 2 1 80 + 2 0 80 + 2 3

Lookup Example  What about looking for 17 (choose N21 or N14)?  It is more than N8+8 but less than N21  Choose actually N14 (the table may not be accurate).  Only the successor of each node is assumed to be correct

What each node maintains?  Each node maintains a successor and a predecessor.  Only the successor is assumed to always be valid.  Well, the predecessor too, but it is not used in searching

n id finger[4]finger[3] n id finger[4]finger[3]finger[2] Return finger[3] even if n + 2^(4-1) < id (you cannot assume there are no nodes between n + 2^(4-1) and finger[4]) finger[4] points to where I think key n + 2^(4-1) should be stored n + 2^(4-1) Lookup algorithm

Comments on lookup  Two important characteristics of finger tables  Each node stores info about only a small number of others  A node does not generally have enough info to directly determine the successor of an arbitrary key k

N8 N8+1N14 N8+2N14 N8+4N14 N8+8N21 N8+16N32 N8+32N42 N14 N21 N32 N38 N42 N48 N51 N56 Key : 54 K 54 N42+1N48 N42+2N48 N42+4N48 N42+8N51 N42+16N8 N42+32------ Successor of N51 Lookup (54) P n f Distance between f and p is at most 2 i-1 Distance between n and f is at least 2 i-1 m : 6 Scalable key location

Faster Lookups: O(logN) hops whp N32 N10 N5 N20 N110 N99 N80 N60 Lookup(K19) K19

o Three step process: o Initialize all fingers of new node o Updating Successor and Predecessor o Transfer keys from successor to new node o Less aggressive mechanism (lazy finger update): o Initialize only the finger to successor node o Periodically verify immediate successor, predecessor o Periodically refresh finger table entries Joining the Ring

o Initialize the new node finger table o Locate any node p in the ring o Ask node p to lookup fingers of new node N36 o Return results to new node N36 1. Lookup(37,38,40,…,100,164) N60 N40 N5 N20 N99 N80 Joining the Ring - Step 1

o Updating Successor and Predecessor o The successor is identified by finger[1]. o Ask the successor about its predecessor. o Tell your new successor you are its predecessor o Tell the predecessor of your successor that you are its successor. N36 N60 N40 N5 N20 N99 N80 Joining the Ring - Step 2

o Transfer keys from successor node to new node o only keys in the range are transferred Copy keys 21..36 from N40 to N36 K30 K38 N36 N60 N40 N5 N20 N99 N80 K30 K38 Joining the Ring - Step 3

o Failure of nodes might cause incorrect lookup N120 N113 N102 N80 N85 N10 Lookup(90) o N80 doesn’t know correct successor, so lookup fails o Successor fingers are enough for correctness Handling Failures

o Use successor list o Each node knows r immediate successors o After failure, will know first live successor o Correct successors guarantee correct lookups o Guarantee is with some probability o Can choose r to make probability of lookup failure arbitrarily small Handling Failures

Simulation Setup  Packet-level Simulation  Packet Delay of Exponential Distribution with the mean of 50ms  Iterative Look up Procedure  Stabilization Protocol invoked every two minutes in average  24-bit identifiers used

Load Balancing  Parameters:  10,000 Nodes  100,000 to 1,000,000 Keys in increments of 100,000  Perform 20 times

Load Balancing

Modification  Provide a uniform coverage of the identifier space  Allocate a set of virtual nodes to real nodes  Allocate r virtual nodes to each real node  r = 1, 2, 5, 10, and 20

Modification

Tradeoff  r times of space to store routing tables  For N=1,000,000  r =20, 400 entries needed for a table

Path Length  O(logN) messages needed for a lookup in theory  Simulated for 2 k nodes and 100 x 2 k keys  In average, 100 keys in each node

Path Length

Path Length (N=2 12 )

Node Failures  10,000 Node Network  1,000,000 Keys  A percentage of p nodes to fail

Node Failures Failed lookups

Summary  Basic operation: lookup(key)  IP addr.  Deterministic node position and data placement  In an N-node network  O(1/N) fraction of keys moved for an Nth node joining or leaving the network  O(logN) nodes maintained in a routing table  O(logN) messages for a lookup  O( (logN) 2 ) messages for update when a node joins or leaves

Question?

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Similar presentations

Presentation on theme: "SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.

Similar presentations

Presentation on theme: "SIGCOMM 2001 Lecture slides by Dr. Yingwu Zhu Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications."— Presentation transcript:

Similar presentations

About project

Feedback