Distributed Hash Tables

Distributed Hash Tables
Parallel and Distributed Computing Spring 2011

Distributed Hash Tables
Academic answer to p2p Goals Guaranteed lookup success Provable bounds on search time Provable scalability Makes some things harder Fuzzy queries / full-text search / etc. Hot Topic in networking since introduction in ~2000/2001

DHT: Overview Abstraction: a distributed “hash-table” (DHT) data structure supports two operations: put(id, item); item = get(id); Implementation: nodes in system form a distributed data structure Can be Ring, Tree, Hypercube, Skip List, Butterfly Network, ...

What Is a DHT? A building block used to locate key-based objects over millions of hosts on the internet Inspired from traditional hash table: key = Hash(name) put(key, value) get(key) -> value Challenges Decentralized: no central authority Scalable: low network traffic overhead Efficient: find items quickly (latency) Dynamic: nodes fail, new nodes join General-purpose: flexible naming

The Lookup Problem N2 N1 N3 ? N4 N6 N5 Put (Key=“title”
Value=file data…) Internet ? Client Publisher 1000s of nodes. Set of nodes may change… Get(key=“title”) N4 N6 N5

Lookup(H(audio data))
DHTs: Main Idea N2 N1 N3 Client N4 Lookup(H(audio data)) Publisher Key=H(audio data) Value={artist, album title, track title} N6 N8 N7 Challenge: can we make it robust? Small state? Actually find stuff in a changing system? Consistent rendezvous point, between publisher and client. N9

DHT: Overview (2) Structured Overlay Routing:
Join: On startup, contact a “bootstrap” node and integrate yourself into the distributed data structure; get a node id Publish: Route publication for file id toward a close node id along the data structure Search: Route a query for file id toward a close node id. Data structure guarantees that query will meet the publication. Fetch: Two options: Publication contains actual file => fetch from where query stops Publication says “I have file X” => query tells you has X, use IP routing to get X from

From Hash Tables to Distributed Hash Tables
Challenge: Scalably distributing the index space: Scalability issue with hash tables: Add new entry => move many items Solution: consistent hashing (Karger 97) Consistent hashing: Circular ID space with a distance metric Objects and nodes mapped onto the same space A key is stored at its successor: node with next higher ID

DHT: Consistent Hashing
Key 5 K5 Node 105 N105 K20 Circular ID space N32 N90 K80 A key is stored at its successor: node with next higher ID

What Is a DHT? Distributed Hash Table: key = Hash(data)
lookup(key) -> IP address put(key, value) get( key) -> value API supports a wide range of applications DHT imposes no structure/meaning on keys Key/value pairs are persistent and global Can store keys in other DHT values And thus build complex data structures

Approaches Different strategies
Chord: constructing a distributed hash table CAN: Routing in a d-dimensional space Many more… Commonalities Each peer maintains a small part of the index information (routing table) Searches performed by directed message forwarding Differences Performance and qualitative criteria To solve the problem of distributing a data access structure recently a variety of approaches have been developed, that all try to achieve the same goal, namely performing searches not only with low latency but also by consuming only little network bandwidth. In the following we will study 2 approaches.

DHT: Example - Chord Associate to each node and file a unique id in an uni-dimensional space (a Ring) E.g., pick from the range [0...2m-1] Usually the hash of the file or IP address Properties: Routing table size is O(log N) , where N is the total number of nodes Guarantees that a file is found in O(log N) hops

Example 1: Distributed Hash Tables (Chord)
Hashing of search keys AND peer addresses on binary keys of length m Key identifier = SHA-1(key); Node identifier = SHA-1(IP address) SHA-1 distributes both uniformly e.g. m=8, key(“yellow-submarine.mp3")=17, key( )=3 Data keys are stored at next larger node key p peer with hashed identifier p, data with hashed identifier k k stored at node p such that p is the smallest node ID larger than k k In the following we introduce two approaches for constructing a resource location infrastructure, that are already used on the Internet. Each of the approaches is based on a different abstract model in order to organize a distributed data access structure. The first approach is based on the idea of distributing a hash table (Chord). Data and node identifiers are mapped into the same key space. We assume that the keys are arranged on a circle (or, in other words, all computations are performed modulo m). Then nodes become responsible for storing the data that belongs to "their" interval, which is defined as all key values preceding the node. Such an organization would lead to linear search cost or linear routing table size, when using a naïve approach to organizing the data access structure. predecessor m=8 32 keys stored at Search possibilities? 1. every peer knows every other O(n) routing table size 2. peers know successor O(n) search cost p2 p3

DHT: Chord Basic Lookup
N120 N10 “Where is key 80?” N105 N32 “N90 has K80” Just need to make progress, and not overshoot. Will talk about initialization later. And robustness. Now, how about speed? K80 N90 N60

DHT: Chord “Finger Table”
1/4 1/2 1/8 1/16 1/32 1/64 1/128 N80 Small tables, but multi-hop lookup. Table entries: IP address and Chord ID. Navigate in ID space, route queries closer to successor. Log(n) tables, log(n) hops. Route to a document between ¼ and ½ … Entry i in the finger table of node n is the first node that succeeds or equals n + 2i In other words, the ith finger points 1/2n-i way around the ring

DHT: Chord Join Assume an identifier space [0..8] Node n1 joins 1 7 6
Succ. Table i id+2i succ 1 7 6 2 5 3 4

DHT: Chord Join Node n2 joins 1 7 6 2 5 3 4 Succ. Table i id+2i succ
i id+2i succ 1 7 6 2 Succ. Table i id+2i succ 5 3 4

DHT: Chord Join Nodes n0, n6 join 1 7 6 2 5 3 4 Succ. Table
i id+2i succ Nodes n0, n6 join Succ. Table i id+2i succ 1 7 Succ. Table i id+2i succ 6 2 Succ. Table i id+2i succ 5 3 4

DHT: Chord Join Nodes: n0, n1, n2, n6 Items: f7 1 7 6 2 5 3 4
Succ. Table Items i id+2i succ f7 Nodes: n0, n1, n2, n6 Items: f7 Succ. Table 1 7 i id+2i succ Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4

DHT: Chord Routing Upon receiving a query for item id, a node:
Succ. Table Items i id+2i succ f7 Upon receiving a query for item id, a node: Checks whether stores the item locally If not, forwards the query to the largest successor in its successor table that does not exceed id Succ. Table 1 7 i id+2i succ query(7) Succ. Table 6 2 i id+2i succ Succ. Table i id+2i succ 5 3 4

DHT: Chord Summary Routing table size? Routing time? Log N fingers
Each hop expects to 1/2 the distance to the desired id => expect O(log N) hops.

Load Balancing in Chord
Network size n=10^4 5 10^5 keys The behavior of Chord has been analyzed by means of simulations. One important issue is whether the workload that each peer receives, in terms of data items being stored at the peer is uniformly distributed. In the situation depicted, with a uniform distribution, one would expect 50 keys per node. As we see there exist nodes with more than 450 keys and many with no keys. The problem is that the IP addresses do not map uniformly into the data key space.

Length of Search Paths Network size n=2^12 100 2^12 keys
Path length ½ Log2(n) The search performance is as expected very good. The length of the search paths is closely concentrated around ½ Log2(n). The factor ½ is explained by the fact that the search starts, if we consider the routing tables as an embedding of search trees into the network, at a randomly selected tree depth.

Chord Discussion Performance Qualitative Criteria
Search latency: O(log n) (with high probability, provable) Message Bandwidth: O(log n) (selective routing) Storage cost: O(log n) (routing table) Update cost: low (like search) Node join/leave cost: O(Log2 n) Resilience to failures: replication to successor nodes Qualitative Criteria search predicates: equality of keys only global knowledge: key hashing, network origin peer autonomy: nodes have by virtue of their address a specific role in the network With respect to robustness, Chord can apply replication of data items. The data items are stored in that case at a fixed number of successor peers, such that when a peer fails, the data item can be located at it's successor. Chord has a number of qualitative limitations: since it is based on a hashing approach the only search predicate that can be supported is key equality. The assignment of peers to their location in the key space is fixed by the global IP, therefore peers lack autonomy in deciding which data they want to store. The network can only evolve from a single origin, which implies that either an agreed upon entry point exists or a global identification needs to be maintained for a distinguished Chord network.

Example 2: Topological Routing (CAN)
Based on hashing of keys into a d-dimensional space (a torus) Each peer is responsible for keys of a subvolume of the space (a zone) Each peer stores the addresses of peers responsible for the neighboring zones for routing Search requests are greedily forwarded to the peers in the closest zones Assignment of peers to zones depends on a random selection made by the peer In topological routing a geometric space is used as key space, both for peer and data keys. The space is a d-dimensional torus, where the dimension d is usually low (e.g. 2-10). Peers are responsible for volumes in the space, which means they store data items with keys belonging to this volume. Search requests are routed to neighboring peers with coordinates, which are closer to the searched data key with respect to the geometry of the key space. When peers join, they are free to select their peer key.

Network Search and Join
The first figure illustrates a CAN organization and search. One observes that the spaces can be divided into subvolumes of different sizes. When a search is performed, e.g. starting at coordinate (x,y), it is forwarded stepwise to closer peers, till the search arrives at the correct subvolume. When a node joins the network it can decide for which subvolume it would like to support by selecting a coordinate. This is illustrate by the figure on the right. Assume peer 7 decides for a point that lies in the subvolume that peer 1 is currently responsible for. First it performs a search for this point, finds out that peer 1 is responsible for the respective volume and splits the volume with peer 1. Each of the two nodes are from then responsible for one half of this volume. The routing tables need to be updated for the neighborhoods of peer 7 and peer 1, which requires O(d) operations. Node 7 joins the network by choosing a coordinate in the volume of 1

CAN Refinements Multiple Realities
We can have r different coordinate spaces Nodes hold a zone in each of them Creates r replicas of the (key, value) pairs Increases robustness Reduces path length as search can be continued in the reality where the target is closest Overloading zones Different peers are responsible for the same zone Splits are only performed if a maximum occupancy (e.g. 4) is reached Nodes know all other nodes in the same zone But only one of the neighbors CAN has two ways of how failure resilience can be increased by creating replicas. One possibility is to manage r different coordinate spaces at the same time, such that each node has a zone in each of them. This creates, for every data item r replicas and reduces search time as there is a higher probability that the search starts already close to the target. The other possibility is to assign multiple peers to the same zone and to split the zone only if a maximum occupancy is reached. All nodes know each other within the zone, but only one of the neighbors in the neighboring zone.

CAN Path Length Experimental results show that even with low dimensionality the path length of a search (here #hops) is fairly short. It further improves when multiple realities are used. Note that the axis are of logarithmic scale.

Increasing Dimensions and Realities
Increasing the dimension or the number of realities reduces dramatically the length of the search paths, while it increases the number of neighbors that need to be stored (and changed in case of updates). It can be shown that the search complexity is O(d n^(1/d)) for n nodes and dimension d. Thus by choosing a proper dimension it would be possible for a fixed size of the network to achieve the same search performance as with tree-based approaches (P-Grid, Chord) of O(log n).

CAN Discussion Performance
Search latency: O(d n1/d), depends on choice of d (with high probability, provable) Message Bandwidth: O(d n1/d), (selective routing) Storage cost: O(d) (routing table) Update cost: low (like search) Node join/leave cost: O(d n1/d) Resilience to failures: realities and overloading Qualitative Criteria search predicates: spatial distance of multidimensional keys global knowledge: key hashing, network origin peer autonomy: nodes can decide on their position in the key space An interesting aspect of CAN is that it is possible to exploit the "semantics" of the key space, to support more complex search predicates. For example, it would be possible to encode "semantic proximity" by properly encoding the data keys into the hashed key space. Another possibility would be to map a physical network topology into a CAN key space such that neighboring nodes in the CAN space are also physically close. This would reduce the search cost (in terms of search latency, or number of hops required in the underlying network).

Comparison of some P2P Solutions
Search Paradigm Overlay maintenance costs Search Cost Gnutella Breadth-first on search graph O(1) Chord Implicit binary search trees O(log n) CAN d-dimensional space O(d) O(d n1/d)

DHT Applications Not only for sharing music anymore…
Global file systems [OceanStore, CFS, PAST, Pastiche, UsenetDHT] Naming services [Chord-DNS, Twine, SFR] DB query processing [PIER, Wisc] Internet-scale data structures [PHT, Cone, SkipGraphs] Communication services [i3, MCAN, Bayeux] Event notification [Scribe, Herald] File sharing [OverNet] Split these into sub-areas

DHT: Discussion Pros: Cons: Guaranteed Lookup
O(log N) per node state and search scope Cons: No one uses them? (only one file sharing app) Supporting non-exact match search is hard

When are p2p / DHTs useful?
Caching and “soft-state” data Works well! BitTorrent, KaZaA, etc., all use peers as caches for hot data Finding read-only data Limited flooding finds hay DHTs find needles BUT

A Peer-to-peer Google? Complex intersection queries (“the” + “who”)
Billions of hits for each term alone Sophisticated ranking Must compare many results before returning a subset to user Very, very hard for a DHT / p2p system Need high inter-node bandwidth (This is exactly what Google does - massive clusters)

Writable, persistent p2p
Do you trust your data to 100,000 monkeys? Node availability hurts Ex: Store 5 copies of data on different nodes When someone goes away, you must replicate the data they held Hard drives are *huge*, but cable modem upload bandwidth is tiny - perhaps 10 Gbytes/day Takes many days to upload contents of 200GB hard drive. Very expensive leave/replication situation!

Research Trends: A Superficial History Based on Articles in IPTPS
In the early ‘00s ( ): DHT-related applications, optimizations, reevaluations… (more than 50% of IPTPS papers!) System characterization Anonymization 2005-… BitTorrent: improvements, alternatives, gaming it Security, incentives More recently: Live streaming P2P TV (IPTV) Games over P2P

What’s Missing? Very important lessons learned
…but did we move beyond vertically-integrated applications? Can we distribute complex services on top of p2p overlays?

P2P: Summary Many different styles; remember pros and cons of each
centralized, flooding, swarming, unstructured and structured routing Lessons learned: Single points of failure are very bad Flooding messages to everyone is bad Underlying network topology is important Not all nodes are equal Need incentives to discourage freeloading Privacy and security are important Structure can provide theoretical bounds and guarantees

Distributed Hash Tables

Similar presentations

Presentation on theme: "Distributed Hash Tables"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Hash Tables

Similar presentations

Presentation on theme: "Distributed Hash Tables"— Presentation transcript:

Similar presentations

About project

Feedback