3 What is an overlay network? A logical network implemented on top of a lower-layer networkCan recursively build overlay networksAn overlay link is defined by the application
4 Ex: Virtual Private Networks Links are defined as IP tunnelsMay include multiple underlying routers
5 Unstructured Overlay Networks Overlay links form random graphsNo defined structureExamplesGnutella: links are peer relationshipsOne node that runs Gnutella knows some other Gnutella nodesBitTorrentA node and nodes in its view
6 Structured NetworksA node forms links with specific neighbors to maintain a certain structure of the networkProsMore efficient data lookupMore reliableConsDifficult to maintain the graph structureExamplesEnd-system multicast: overlay nodes form a multicast treeDistributed Hash Tables
7 DHT Overview Used in the real world What problems do DHTs solve? BitTorrent tracker implementationContent distribution networksWhat problems do DHTs solve?How are DHTs implemented?
8 BackgroundA hash table is a data structure that stores (key, object) pairs.Key is mapped to a table index via a hash function for fast lookup.Content distribution networksGiven an URL, returns the object
9 Example of a Hash table: a web cache Page content…….…..…12Client requestsWeb cache returns the page content located at the 1st entry of the table.
10 DHT: why?If the number of objects is large, it is impossible for any single node to store it.Solution: distributed hash tables.Split one large hash table into smaller tables and distribute them to multiple nodes
12 A content distribution network A single provider that manages multiple replicasA client obtains content from a close replica
13 Basic function of DHT DHT is a “virtual” hash table Input: a keyOutput: a data itemData Items are stored by a network of nodes.DHT abstractionOutput: the node that stores the keyApplications handle key and data item association.
14 DHT: a visual exampleKVKV(K1, V1)KVKVKVInsert (K1, V1)
15 DHT: a visual exampleKVKV(K1, V1)KVKVKVRetrieve K1
16 Desired goals of DHT Scalability: each node does not keep much state Performance: small look up latencyLoad balancing: no node is overloaded with a large amount of stateDynamic reconfiguration: when nodes join and leave, the amount of state moved from nodes to nodes is small.Distributed: no node is more important than others.
17 A straw man design Suppose all keys are integers (0, V1)(3, V2)(1, V3)(4, V4)12(2, V5)(5, V6)Suppose all keys are integersThe number of nodes in the network is nid = key % n
18 When node 2 dies A large number of data items need to be rehashed. (0, V1)(2, V5)(4, V4)(1, V3)(3, V2)1(5, V6)A large number of data items need to be rehashed.
19 Fix: consistent hashing A node is responsible for a range of keysWhen a node joins or leaves, the expected fraction of objects that must be moved is the minimum needed to maintain a balanced load.All DHTs implement consistent hashing
20 Basic components of DHTs Overlapping key and node identifier spaceHash(www.cnn.com/image.jpg) a n-bit binary stringNodes that store the objects also have n-bit string as their identifiersBuilding routing tablesNext hops (structure of a DHT)Distance functionsThese two determine the geometry of DHTsRing, Tree, Hybercubes, hybrid (tree + ring) etc.Handle node join and leaveLookup and store interface
22 Chord: basic ideaHash both node id and key into a m-bit one-dimension circular identifier spaceConsistent hashing: a key is stored at a node whose identifier is closest to the key in the identifier spaceKey refers to both the key and its hash value.
23 Chord: ring topology K5 N105 K20 Circular 7-bit ID space N32 N90 K80 Key 5K5Node 105N105K20Circular 7-bitID spaceN32Ids live in a single circular space.N90K80A key is stored at its successor: node with next higher ID
24 Chord: how to find a node that stores a key? Solution 1: every node keeps a routing table to all other nodesGiven a key, a node knows which node id is successor of the keyThe node sends the query to the successorWhat are the advantages and disadvantages of this solution?
25 Solution 2: every node keeps a routing entry to the node’s successor (a linked list) “Where is key 80?”N105N32“N90 has K80”K80N90N60
26 Simple lookup algorithm Lookup(my-id, key-id)n = my successorif my-id < n < key-idcall Lookup(key-id) on node n // next hopelsereturn my successor // doneCorrectness depends only on successorsQ1: will this algorithm miss the real successor?Q2: what’s the average # of lookup hops?Always undershoots to predecessor. So never misses the real successor.Lookup procedure isn’t inherently log(n). But finger table causes it to be.
27 Solution 3: “Finger table” allows log(N)-time lookups 1/81/161/32Small tables, but multi-hop lookup. Table entries: IP address and Chord ID.Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops.Route to a document between ¼ and ½ …1/641/128N80Analogy: binary search
28 Finger i points to successor of n+2i-1 1121/81/161/321/641/128Small tables, but multi-hop lookup. Table entries: IP address and Chord ID.Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops.Route to a document between ¼ and ½ …N80The ith entry in the table at node n contains the identity of the first node s that succeeds n by at least 2i-1A finger table entry includes Chord Id and IP addressEach node stores a small table log(N)
29 Chord finger table example 0+201Keys:5,60+2130+22123456723Keys:13354Keys:257
30 Lookup with fingers Lookup(my-id, key-id) look in local finger table forhighest node n s.t. my-id < n < key-idif n existscall Lookup(key-id) on node n // next hopelsereturn my successor // doneAlways undershoots to predecessor. So never misses the real successor.Lookup procedure isn’t inherently log(n). But finger table causes it to be.
32 Node join Maintain the invariant Each node’ successor is correctly maintainedFor every node k, node successor(k) answers for key k. It’s desirable that finger table entries are correctEach nodes maintains a predecessor pointerTasks:Initialize predecessor and fingers of new nodeUpdate existing nodes’ stateNotify apps to transfer state to new node
33 Chord Joining: linked list insert 1. Lookup(36)K30K38N40Node n queries a known node n’ to initialize its statefor its successor: lookup (n)
34 Join (2)N252. N36 sets its ownsuccessor pointerN36K30K38N40
35 Join (3) Note that join does not make the network aware of n N25 3. Copy keysfrom N40 to N36N36K30K30K38N40Concurrent join and stabilization are provably consistent eventually.Note that join does not make the network aware of n
36 Join (4): stabilize N25 4. Set N25’s successor pointer N36 K30 N40 K38 Stabilize 1) obtains a node n’s successor’s predecessor x, and determines whether x should be n’s successor 2) notifies n’s successor n’s existenceN25 calls its successor N40 to return its predecessorSet its successor to N36Notifies N36 it is predecessorUpdate finger pointers in the background periodicallyFind the successor of each entry iCorrect successors produce correct lookups
37 Failures might cause incorrect lookup No problem until lookup gets to a node which knows of no node < key.There’s a replica of K90 at N113, but we can’t find it.N80N80 doesn’t know correct successor, so incorrect lookup
38 Solution: successor lists Each node knows r immediate successorsAfter failure, will know first live successorCorrect successors guarantee correct lookupsGuarantee is with some probabilityHigher layer software can be notified to duplicate keys at failed nodes to live successors
39 Choosing the successor list length Assume 1/2 of nodes failP(successor list all dead) = (1/2)rI.e. P(this node breaks the Chord ring)Depends on independent failureP(no broken nodes) = (1 – (1/2)r)Nr = 2log(N) makes prob. = 1 – 1/N
40 Lookup with fault tolerance Lookup(my-id, key-id)look in local finger table and successor-listfor highest node n s.t. my-id < n < key-idif n existscall Lookup(key-id) on node n // next hopif call failed,remove n from finger tablereturn Lookup(my-id, key-id)else return my successor // doneAlways undershoots to predecessor. So never misses the real successor.Lookup procedure isn’t inherently log(n). But finger table causes it to be.
41 Chord performance Per node storage Lookup latency Ideally: K/N Implementation: large variance due to unevenly node id distributionLookup latencyO(logN)
42 Comments on Chord Later work fixes these issues ID distance Network distanceReducing lookup latency and localityStrict successor selectionCan’t overshootAsymmetryA node does not learn its routing table entries from queries it receivesLater work fixes these issues
43 Conclusion Overlay networks Structured vs Unstructured Design of DHTs Chord
44 Lab 3 Congestion Control This lab is based on Lab 1, you don’t have to change much to make it work.You are required to implement a congestion control algorithmFully utilize the bandwidthShare the bottleneck fairlyWrite a report to describe your algorithm design and performance analysisYou may want to implement at leastSlow startCongestion avoidanceFast retransmit and fast recoveryRTO estimatorNew RENO is a plus. It handles multiple packets loss very well.
45 Lab 3 Congestion Control A1 on linux21transmitting file1B1 on linux23transmitting file2A2 on linux22receiving file1B2 on linux24receiving file2relayer on linux25simulating the bottleneckport: 10000port: 20000port: 40000port: 30000port: 50001port: 50003port: 50002port: 50004bottleneck link L