Presentation is loading. Please wait.

Presentation is loading. Please wait.

Consistent Hashing: Load Balancing in a Changing World

Similar presentations


Presentation on theme: "Consistent Hashing: Load Balancing in a Changing World"— Presentation transcript:

1 Consistent Hashing: Load Balancing in a Changing World
David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy

2 Load Balancing Task: distribute items into buckets
Data to memory locations Files to disks Tasks to processors Web pages to caches (our motivation) Goal: even distribution

3 World Wide Web X IBM LCS ATT CMU Servers Browsers (clients)

4 Hot Spots X W3C THOR IBM LCS CILK PDOS ATT CMU

5 Temporary Loads For permanent loads, use bigger server
Must also deal with “flash crowds” IBM chess match NASA Inefficient to design for max load rarely attained much capacity wasted Better to offload peak load elsewhere

6 Proxy Caches Balance Load
W3C THOR IBM LCS CILK PDOS ATT CMU

7 Proxy Caching Old: server hit once for each browser
New: server hit once for each page Adapts to changing access patterns

8 Proxy Caching Every server can also be a cache Provides a social good
Reduces load at sites you want to contact Costs you little, if done right few accesses small amount of storage (times many servers)

9 Who Caches What? Each cache should hold few items
else cache gets swamped by clients Each item should be in few caches else server gets swamped by caches and cache invalidations/updates expensive Browser must know right cache could ask server for redirect server gets swamped by redirect requests

10 Hashing Simple and powerful load balancing
Constant time to find bucket for item Example map to n buckets. Pick a,b: y=ax+b (mod n) Intuition: hash maps each item to one random bucket no bucket gets many items

11 Problem: Adding Caches
Suppose a new cache arrives. How work it into hash function? Natural change: y=ax+b (mod n+1) Problem: changes bucket for every item every cache will be flushed servers get swamped with new requests Goal: when add bucket, few items move

12 Problem: Inconsistent Views
Each client knows about a different set of caches: its view View affects choice of cache for item With many views, each cache will be asked for item Item in all caches---swamps server Goal: item in few caches despite views

13 Problem: Inconsistent Views
LCS My view 1 2 3 caches ax+b (mod 4) = 2

14 Problem: Inconsistent Views
LCS John’s view 1 2 3 caches ax+b (mod 4) = 2

15 Problem: Inconsistent Views
LCS Dave’s view 1 2 3 caches ax+b (mod 4) = 2

16 Problem: Inconsistent Views
LCS Al’s view 1 2 3 caches ax+b (mod 4) = 2

17 Problem: Inconsistent Views
LCS Mike’s view 1 2 3 caches ax+b (mod 4) = 2

18 Problem: Inconsistent Views
LCS caches 2 2 2 2 2

19 Consistent Hashing A new kind of hash function
Maps any item to a bucket in my view Computable in constant time, locally 1 standard hash function Adding bucket to view takes logarithmic time logarithmic number of standard hash functions Handles incremental and inconsistent views

20 Single View Properties
Balance: all buckets get roughly same number of items (like standard hashing) Smooth: when an kth bucket is added, only a 1/k fraction of the items move and only from O(log n) servers minimum needed to preserve balance

21 Multiple View Properties
Consider n views, each of an arbitrary constant fraction of the buckets Load: number of items a bucket gets from all views is O(log n) times average Despite views, load balanced Spread: over all views, each item appears in O(log n) buckets Despite views, few caches for each item

22 Implementation Use standard hash function H to map buckets, items to unit interval “random” points spread uniformly Item Bucket

23 Implementation Use standard hash function H to map buckets, items to unit interval “random” points spread uniformly Item assigned to nearest bucket Item Bucket

24 Computation Cost Bucket positions precomputed To hash item
compute H find nearest bucket point O(log n) time using binary search Constant time with auxiliary hash table

25 Balance Bucket points uniformly distributed by H
Each bucket “owns” equal portion of line Item position “random” by H So item equally likely to be at any bucket So all buckets get about same # items Bucket

26 Smoothness To add a kth bucket, hash it to line Old bucket

27 Smoothness To add a kth bucket, hash it to line Old bucket New bucket

28 Smoothness To add a kth bucket, hash it to line
Captures items nearest to it only 1/k fraction of total items only from 2 other buckets (on each side) Old bucket New bucket

29 Low Spread Some views might not see nearest bucket to an item, hash it elsewhere But every view will have a bucket near the item (by “random” placement) So only buckets near the item will ever have to hold it But only a few buckets are near the item (by “random” placement)

30 Low Load Bucket only gets item I if no other bucket is closer to I
Under any view, some bucket is close to I by “random” placement of buckets So a bucket only gets items close to it But an item is unlikely to be close So bucket doesn’t get many items

31 Summary: Consistent Hashing
Trivial to implement (20 lines code) don’t try this at home! Fast to compute (no hidden constants) Uniformly distributes items Can cheaply add/remove buckets Even with multiple views no bucket gets many items each item in only a few buckets

32 Caching Consistent hashing good for caching
client maps known caches to unit interval when item arrives, hash to cache server gets O(log n) requests for its own pages Each server can also be a cache gets small number of requests for others’ pages Consistent hashing is robust caches can come and go different browsers can know different caches

33 Refinements Browser bandwidth to machines may vary
If bandwidth to server high, unwilling to use low bandwidth cache Consistently hash item only to caches with bandwidth as good as server Theorem: all previous properties still hold uniform cache loads low server loads (few caches per item)

34 Fault Tolerance Suppose contacted cache is down
Delete from bucket set, find next closest bucket in consistent hashing interval Just a small change in view… Even with many failures, previous properties (uniform low load, etc) still hold.

35 Hot pages What if one page gets popular?
cache responsible for it gets swamped Use tree of caches (e.g. harvest, DNS)? cache at root gets swamped Use a different tree for each page build using consistent hashing Balances load for hot pages and servers

36 Main Result Using cache trees of logarithmic depth, for any set of page accesses, can adaptively balance load such that every server gets at most log times average load of system (browser/server ratio). (assorted theory caveats)

37 Conclusion Consistent hashing balances load
Simple enough to be practical (?) We’d like to build distributed web cache need help of good systems student Looking for other applications DNS lookup? PICS? URN resolution? NOWs? Distributed file systems?


Download ppt "Consistent Hashing: Load Balancing in a Changing World"

Similar presentations


Ads by Google