Presentation is loading. Please wait.

Presentation is loading. Please wait.

Simple Load Balancing for Distributed Hash tables

Similar presentations


Presentation on theme: "Simple Load Balancing for Distributed Hash tables"— Presentation transcript:

1 Simple Load Balancing for Distributed Hash tables
-Torsha Banerjee Presentation for Internet & Web Algorithms

2 Presentation Outline What is DHT (Distributed Hash Table)? Why DHTs?
An example Applications Load Balancing How lookup works? Paper Idea Performance Evaluation Conclusion References

3 Distributed hash table Distributed application
What is DHT (Distributed Hash Table)? Distributed hash table Distributed application get (key) data node …. put(key, data) No central server Partitions an ID space among n servers Each distributed server has partial list of where data is stored in the system Keys are mapped uniformly to data values A "lookup" algorithm is required to locate data put (key, data) and get (key, data) functions are used to handle data (Figure adopted from Frans Kaashoek)

4 What is DHT (Distributed Hash Table)? contd..
Architectures CAN (Content Adressable Network) [1] keys hashed into d dimensional space operations performed are insert(key, value), retrieve(key) chooses the neighbor nearest to the destination peer for storage Chord [2] maps key to a peer (server) for storage maintains routing information as nodes join and leave the system

5 What is DHT (Distributed Hash Table)? contd..
Architectures contd.. PASTRY [3] uses routing table, leaf set, neighborhood set, and file table routes messages to the node with the closest ID to the key low message overhead but higher latency and failure rate TAPESTRY [4] the first DHT routes messages to endpoints (peers) objects are known by name, not location creates a routing mesh of neighbors each peer stores a neighbor map complex and hard to maintain

6 Why DHTs? Provides incremental scalability of throughput and data capacity as more nodes are added to the cluster Robustness is achieved through data replication among multiple cluster nodes Building block for peer-to-peer applications Provides improved security and robustness

7 An Example Circular number space 0 to 127
Routing rule is to move clockwise until current node ID  key, and last hop peer (node) ID < key Example: key = 42 Key is routed to node 58 in the clockwise direction Node 58 “owns” keys in [34,58] New entries always start with one known number Newcomer searches for “self” in the network hash key = newcomer’s node ID Search results in a node in the vicinity where newcomer needs to be 13 58 81 111 127 33 24 97

8 Applications Any application that requires a hash table (database systems, symbol table in compliers) Storage, archival Web serving, caching Content distribution Query & indexing Naming systems Communication primitives Chat services Application-layer multi-casting Event notification services Publish/subscribe systems

9 Load Balancing Distributing processing and communications activity evenly across a computer network Single point failure is avoided Important for networks where it's difficult to predict the number of requests that will be issued to a server Two components of load balancing Consistent hashing over a 1-D space Some kind of indexing topology is used to navigate the space

10 Load Balancing contd.. Consistent hashing:
Both keys and peers are hashed on a 1-D ring (e.g. Chord ring) Keys are assigned to the nearest peer in the clockwise direction Existence of overlay edges for faster search along the ring Load imbalance may occur with increasing arc length Maximum arc length is Chord [2] solves this problem to some existent by introducing virtual peers

11 How Lookup works? In an m-bit identifier space, there are 2m identifiers Identifiers are ordered on an identifier circle modulo 2m The identifier ring is called Chord ring Key k is assigned to the first node whose identifier is equal to or follows k in the identifier space Each node maintains a routing table with (at most) m entries Finger table for node n is calculated from the adjacent table

12 How Lookup works? contd.. 10 [10,2) 7 [6,10) 6 5 [4,6) 4 [3,4) 3 1 15
14 12 10 7 5 3 6 8 9 15 1 Example: Chord [2] 10 [10,2) 7 [6,10) 6 5 [4,6) 4 [3,4) 3 succ. interval start Finger Table for Node 2 successor(6) = 7

13 Paper Idea Chord cannot totally solve the load balancing issue
One peer may be responsible for items Number of edges per peer is in the worst case incurring higher messaging cost and storage space Application of “power of two choices” [5,6] proposed in the paper [7] Each peer represents a point in the circle (no finger table) Peers arranged sequentially d>=2 peers are chosen by an item for storage and the one with the least load is selected finally

14 Paper Idea contd.. 0.4 1 Peer 1 is selected for storage of item x The maximum load of any peer is Four different hash functions are used to select for different peers 0.46 15 Item x- 14 2 3 12 5 0.6 10 6 7 0.51 9 8

15 Paper Idea contd.. If any two chosen peers have same load tie needs to be broken Arbitrarily Vocking’s scheme Always-go-left scheme Bins divided into d groups of size Groups are numbered from 1 to d If there are several locations with same load, the ball is placed in the location of the leftmost group Choosing the least loaded arc of smallest length gives better results than Vocking’s scheme

16 Algorithm description
Paper Idea contd.. Algorithm description h0 is used to map peers onto the ring Hash functions h0(x), h1(x),…, hd(x) are first calculated by a peer to insert item x Peers p1,…,pd are looked up in parallel corresponding to the hash functions h0(x), h1(x),…, hd(x) pi with lowest load is selected for storing x For searching an already inserted item x, again the d peers are looked up and the one storing the key-value pair is returned

17 Paper Idea contd.. Disadvantage is the increase in network traffic by an order of d solved by the redirection pointers All peers pj store a redirection pointer where pi is the least loaded peer For searching x, hj(x) is used to find pj If pj does not have x, it returns the redirection pointer With uniform selection of hj, this extra step is required with probability A particular peer needs to be active all the time [8] solves this problem to some extent

18 Paper Idea contd.. Static placement of items leads to poor performance
Can be solve by letting item x change location when reinserted if pi, storing x previously, has become heavily loaded now Schemes like load stealing and load-shedding are used Load stealing- If pi is an under utilized peer it takes away load from other peers with heavier loads 2 14 12 10 7 5 3 6 8 9 15 1 0.9 0.02 Item x Copy of Item x Redirection pointer Stealing of data

19 Paper Idea contd.. Load shedding- If pi is overloaded, it offloads data to lighter peers Out of peers, p1,…,pd, x can be located in k least loaded peers thus replicating data Enables parallel downloading from multiple sources Item x 1 15 0.9 14 2 Redirection pointer 3 12 Offloading of data 5 10 0.02 6 7 Copy of Item x 9 8

20 Performance Evaluation
The three schemes compared with Chord [2] are virtual peers An unbounded number of virtual peers The proposed power of two scheme Simulation parameters Number of peers Number of items 1st percentile and 99th percentile load for Chord

21 Performance Evaluation contd..

22 Performance Evaluation contd..
Maximum load is key metric The highest loaded peers have greater probability of failing Cascading effect occurs leading to poor performance by neighboring peers Performance of virtual peers is similar to that of Chord Maximum load is high of the order of 350 Increase in topology maintenance traffic due to virtual peers Load balancing is best in the unlimited case Less variation in load compared to the “power of two” choice case Maximum load is similar to “power of two” choice case of the order of 150

23 Performance Evaluation contd..
Unlimited resource scenario is unrealistic and expensive The “power of two” choice case exhibits greater variation in load compared to the unlimited resource case Maximum load is similar to the unlimited resource case Better load balancing than virtual peer case Routing information shared is less compared to Chord

24 Conclusion The “power of two” choice case provides multiple storage options compared to Chord Increased fault tolerance Better performance in highly dynamic environments Trade off is small increase in amount of static storage at each peer Small additive constant in search length

25 References [1] Sylvia Ratnasamy, Paul Francis, Mark Handley, Richard Karp, Scott Shenker, “A Scalable Content-Addressable Network,” Proceedings of ACM SIGCOMM 2001. [2] Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan, “Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications,” Proceedings of the 2001 ACM SIGCOMM. [3] A. Rowstron and P. Druschel, “Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems,“ IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), 2001.

26 References contd.. [4] Ben Y. Zhao, John Kubiatowicz, Anthony D. Joseph, “Tapestry: An Infrastructure for Fault-tolerant Wide-area Location and Routing,” SIGCOMM 2001. [5] Azar, Y., Broder, A., Karlin, A., and Upfal, E., “Balanced allocations. SIAM Journal on Computing 29, 1 (1999), [6] Mitzenmacher, M., Richa, A., and Sitaraman, R., “The Power of Two Choices: A Survey of Techniques and Results”, Kluwer Academic Publishers. [7] John Byers, Jeffrey Considine and Mitzenmacher, “Simple Load Balancing for Distributed Hash Table”,


Download ppt "Simple Load Balancing for Distributed Hash tables"

Similar presentations


Ads by Google