CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN

What are DHTs? A DHT is a topology that provides similar functionality to a typical hash table. –put(key, value) –get(key) Peers are buckets in the table –with their own local hash tables Allows a peer to publish a resource onto a network using a key to determine where the data will be stored (i.e. which peer will receive the data). Using keys presupposes a logical ‘space’ which the keys map onto. –The key is mapped to the space using a hashing function to ensure equal distribution of resources across the network. –Nodes are responsible for sections of this space.

Why DHTs? Address the flooding issue without resorting to centralized/decentralized architecture. –no super peers, no power law distribution Typically search can be achieved in O(logn) hops where n is the number of nodes in the network. only a few neighbors need to be known – typically O(logn) small neighborhoods and flat topology makes for a robust network, easy to handle churn. DHTs guarantee locating a file (or where it should be) –deterministic –unlike unstructured systems such as Gnutella, KaZaA

DHTs Uses Not just file sharing BitTorrent trackers in a DHT super node-like caching (Skype possibly does this) –location independent naming service typically DHTs are used as a backbone of relatively powerful nodes supporting weaker nodes –because DHTs are flat and presume equality of capability.

Content Addressable Network - CAN CAN stands for ‘content-addressable network’. A network that provides a routing overlay on top of a physical network to optimise publishing and searching for data. Addresses the ‘flooding’ issue. CAN is based on the Distributed Hash Table (DHT) concept.

How does CAN work? The CAN space is defined as a d -dimensional Cartesian coordinate space. At any given time the entire coordinate space is divided amongst the nodes in the system. Each node owns its own distinct zone within the overall space. A uniform hashing algorithm is used to map a key to coordinate space –k -> coord{x, y, z} –not specified by CAN –as long as it has the properties of a uniform hash, i.e., even distribution. messages contain the destination coordinates. –and are routed to the peer whose zone contains the coordinates

Example 2-D Space 0,0 1,0 0,1 1,1 A (0-0.5,0-0.5) B (0-0.5,0.5-1) C (0.5-1,0-0.5) D (0.5-0.75, 0.5-1) E (0.75-1, 0.5-1)

Neighbours A node is considered a neighbour if its zone overlaps along d -1 dimensions and abuts along one dimension. A node maintains info about its neighbours – a contact address and its zone coordinates. An evenly divided space means each node has 2 d neighbours

Neighbours ( torus - the space wraps) A (B, C) B (A, D, E) C (A, D, E) D (B, E, C) E (B, D, C)

Routing Routing happens by following a straight line path through the Cartesian space from source to destination coordinates. A message with destination point P is routed to the neighbour whose zone coordinates are closest to P. There are multiple paths available at any point.

Routing to P(x,y) P(x,y)

Joining the CAN To join the CAN, as with many other systems, a node needs a bootstrap node, i.e. the address and coords of a node already in the system. When a new node wants to join it randomly chooses a point in the coordinate system. The message is routed to the node whose coordinate space contains the point. That node splits its space in half, keeping one half and handing over the other to the new node.

9 wants to join. It Finds a bootstrap Node (out of bounds). Let’s say it’s 5. Node 1 splits and hands Over half its zone and Relevant neighbours to 9. …Joining 62 31 4 85 Bootstrap node 9 chooses a point ( x,y ) In the space. The bootstrap node initiates the Routing. ( x,y ) Node 1’s zone Contains point ( x,y ) 19 9

CAN Structure When a node wants to leave the network it finds a neighbour it can merge its zone with. Because the coordinate space is recursively divided in half, the network can be though of as a binary tree in which every network node/zone is a leaf on the tree. Vertices are previously partitioned zones in a particular dimension. If a node takes over a sibling’s zone both child nodes of the tree are merged and become their parent.

1 3 2 4 CAN Seen as a Tree

Scalability The number of neighbours maintained by a node is a function of the amount of dimensions not the overall size of the CAN – so more dimensions – more neighbours. However, more dimensions means shorter routing paths. As the node number increases, the routing paths grow as O ( n 1/ d ). i.e. tradeoff between neighbour numbers (and hence maintenance overhead) and path length

Maintenance of the CAN Nodes send each other periodic update messages. –zone coords, list of neighbour nodes and their coords If a node doesn’t hear from a neighbour after a given amount of time, it initiates a TAKEOVER. –starts a timer that is relative to its zone size –the bigger its zone size, the longer the timer –All the neighbours of the dead node are doing this. –The one with the shortest timer times out, and tells the dead node’s neighbours It then tries to merge its zone with the dead node’s zone. If a complete zone cannot be made through merging, the smallest known node temporarily looks after two zones –A zone can be merged with another if its coords in d-1 dimensions overlap and the remaining dimensions abut and have an equal width.

CAN Repair Algorithm Easiest envisaged using the binary tree conception. If a node finds itself with a zone (call it zone B) that it cannot merge with its existing zone (A) : check if B has a sibling that is also a leaf this is easy - merge if not, perform a depth first search down the subtree rooted at B, until two siblings are found. then merge those as usual (leaves one node without a zone) hand over B to the node with no zone after merging

Tree Again depth first search for node with sibling B A

Tree Again merge siblings into parent node B A

Tree Again assign empty node to unoccupied zone B A

Resource Availability If a node fails, then the resources go with it. –So entities that publish need to periodically refresh the data. Resource duplication is another mechanism. This can be done in two ways: –Overloading coordinate zones multiple nodes share zones and the resources mapped to them. –They know about each other data can be either replicated or partitioned new peers can choose neighbours with a lower latency –Multiple ‘realities’…

Realities Realities can be ‘stacked’ so a node maintains different zones in different realities concurrently. This allows for duplication of resources and shorter routing paths – i.e. in a 2-D coordinate system routing can happen ‘horizontally’ and ‘vertically’.

Further Design Extensions Caching –because of multiple paths to a key, caches can grow with the popularity of a key –i.e. if I route a particular key many times, I can decide to get the data and store it myself. Location awareness –use landmarks, e.g. DNS root name servers –measure Round Trip Time (RTT) –order nodes according to RTT –when joining, choose a node whose landmark ordering is similar –means close nodes make up neighbourhoods

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

Similar presentations

Presentation on theme: "CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN.

Similar presentations

Presentation on theme: "CAN 1.Distributed Hash Tables a)DHT recap b)Uses c)Example – CAN."— Presentation transcript:

Similar presentations

About project

Feedback