Consistent Hashing: Load Balancing in a Changing World

Slides:



Advertisements
Similar presentations
4.4 Page replacement algorithms
Advertisements

P2P data retrieval DHT (Distributed Hash Tables) Partially based on Hellerstein’s presentation at VLDB2004.
Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek, Hari Balakrishnan MIT and Berkeley presented by Daniel Figueiredo Chord: A Scalable Peer-to-peer.
Pastry Peter Druschel, Rice University Antony Rowstron, Microsoft Research UK Some slides are borrowed from the original presentation by the authors.
Replication. Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
CS4432: Database Systems II Hash Indexing 1. Hash-Based Indexes Adaptation of main memory hash tables Support equality searches No range searches 2.
Hashing. CENG 3512 Motivation The primary goal is to locate the desired record in a single access of disk. – Sequential search: O(N) – B+ trees: O(log.
Chord: A scalable peer-to- peer lookup service for Internet applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashock, Hari Balakrishnan.
1 1 Chord: A scalable Peer-to-peer Lookup Service for Internet Applications Dariotaki Roula
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 18: Hash Tables.
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
B+-tree and Hashing.
© 2006 Pearson Addison-Wesley. All rights reserved13 A-1 Chapter 13 Hash Tables.
Hash Tables and Associative Containers CS-212 Dick Steflik.
CSC 2300 Data Structures & Algorithms February 27, 2007 Chapter 5. Hashing.
Tirgul 8 Universal Hashing Remarks on Programming Exercise 1 Solution to question 2 in theoretical homework 2.
A Scalable Content-Addressable Network Authors: S. Ratnasamy, P. Francis, M. Handley, R. Karp, S. Shenker University of California, Berkeley Presenter:
FALL 2004CENG 3511 Hashing Reference: Chapters: 11,12.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications Ion Stoica, Robert Morris, David Karger, M. Frans Kaashoek and Hari alakrishnan.
Secure Overlay Services Adam Hathcock Information Assurance Lab Auburn University.
Peer To Peer Distributed Systems Pete Keleher. Why Distributed Systems? l Aggregate resources! –memory –disk –CPU cycles l Proximity to physical stuff.
Wide-area cooperative storage with CFS
A DoS-Resilient Information System for Dynamic Data Management Stefan Schmid & Christian Scheideler Dept. of Computer Science University of Paderborn Matthias.
Distributed Data Stores – Facebook Presented by Ben Gooding University of Arkansas – April 21, 2015.
Server Load Balancing. Introduction Why is load balancing of servers needed? If there is only one web server responding to all the incoming HTTP requests.
CS294, YelickLoad Balancing, p1 CS Distributed Load Balancing
CMPE 421 Parallel Computer Architecture
MIT Consistent Hashing: Load Balancing in a Changing World David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy.
CSC 213 – Large Scale Programming Lecture 37: External Caching & (a,b)-Trees.
Hashing Chapter 20. Hash Table A hash table is a data structure that allows fast find, insert, and delete operations (most of the time). The simplest.
© 2006 Pearson Addison-Wesley. All rights reserved13 B-1 Chapter 13 (continued) Advanced Implementation of Tables.
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
1 CSE 326: Data Structures: Hash Tables Lecture 12: Monday, Feb 3, 2003.
MapReduce and GFS. Introduction r To understand Google’s file system let us look at the sort of processing that needs to be done r We will look at MapReduce.
Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications.
CS 347Notes101 CS 347 Parallel and Distributed Data Processing Distributed Information Retrieval Hector Garcia-Molina Zoltan Gyongyi.
FALL 2005 CENG 351 Data Management and File Structures 1 Hashing.
Ihab Mohammed and Safaa Alwajidi. Introduction Hash tables are dictionary structure that store objects with keys and provide very fast access. Hash table.
Empirical Quantification of Opportunities for Content Adaptation in Web Servers Michael Gopshtein and Dror Feitelson School of Engineering and Computer.
Distributed Systems CS Consistency and Replication – Part IV Lecture 21, Nov 10, 2014 Mohammad Hammoud.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Lecture 12 Distributed Hash Tables CPE 401/601 Computer Network Systems slides are modified from Jennifer Rexford.
Chord Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber Google,
Hash Table March COP 3502, UCF 1. Outline Hash Table: – Motivation – Direct Access Table – Hash Table Solutions for Collision Problem: – Open.
COSC 2007 Data Structures II Chapter 13 Advanced Implementation of Tables IV.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
 Cachet Technologies 1998 Cachet Technologies Technology Overview February 1998.
Replication (1). Topics r Why Replication? r System Model r Consistency Models r One approach to consistency management and dealing with failures.
Chapter 13 C Advanced Implementations of Tables – Hash Tables.
LOOKING UP DATA IN P2P SYSTEMS Hari Balakrishnan M. Frans Kaashoek David Karger Robert Morris Ion Stoica MIT LCS.
Towards a Scalable and Robust DHT Baruch Awerbuch Johns Hopkins University Christian Scheideler Technical University of Munich.
1 Secure Peer-to-Peer File Sharing Frans Kaashoek, David Karger, Robert Morris, Ion Stoica, Hari Balakrishnan MIT Laboratory.
Md Tareq Adnan Centralized Approach : Server & Clients Slow content must traverse multiple backbones and long distances Unreliable.
System Models Advanced Operating Systems Nael Abu-halaweh.
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Chord: A Scalable Peer-to-Peer Lookup Service for Internet Applications * CS587x Lecture Department of Computer Science Iowa State University *I. Stoica,
CS 545 – Fundamentals of Stream Processing – Consistent Hashing
Spencer MacBeth Supervisor - Dr. Ramon Lawrence
Coral: A Peer-to-peer Content Distribution Network
CSE373: Data Structures & Algorithms Lecture 6: Hash Tables
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Wireless Sensor Networks 7. Geometric Routing
File System Structure How do I organize a disk into a file system?
Database Management Systems (CS 564)
Load Balancing Memcached Traffic Using SDN
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
General External Merge Sort
Lecture-Hashing.
Presentation transcript:

Consistent Hashing: Load Balancing in a Changing World David Karger, Eric Lehman, Tom Leighton, Matt Levine, Daniel Lewin, Rina Panigrahy

Load Balancing Task: distribute items into buckets Data to memory locations Files to disks Tasks to processors Web pages to caches (our motivation) Goal: even distribution

World Wide Web X IBM LCS ATT CMU Servers Browsers (clients)

Hot Spots X W3C THOR IBM LCS CILK PDOS ATT CMU

Temporary Loads For permanent loads, use bigger server Must also deal with “flash crowds” IBM chess match NASA Inefficient to design for max load rarely attained much capacity wasted Better to offload peak load elsewhere

Proxy Caches Balance Load W3C THOR IBM LCS CILK PDOS ATT CMU

Proxy Caching Old: server hit once for each browser New: server hit once for each page Adapts to changing access patterns

Proxy Caching Every server can also be a cache Provides a social good Reduces load at sites you want to contact Costs you little, if done right few accesses small amount of storage (times many servers)

Who Caches What? Each cache should hold few items else cache gets swamped by clients Each item should be in few caches else server gets swamped by caches and cache invalidations/updates expensive Browser must know right cache could ask server for redirect server gets swamped by redirect requests

Hashing Simple and powerful load balancing Constant time to find bucket for item Example map to n buckets. Pick a,b: y=ax+b (mod n) Intuition: hash maps each item to one random bucket no bucket gets many items

Problem: Adding Caches Suppose a new cache arrives. How work it into hash function? Natural change: y=ax+b (mod n+1) Problem: changes bucket for every item every cache will be flushed servers get swamped with new requests Goal: when add bucket, few items move

Problem: Inconsistent Views Each client knows about a different set of caches: its view View affects choice of cache for item With many views, each cache will be asked for item Item in all caches---swamps server Goal: item in few caches despite views

Problem: Inconsistent Views LCS My view 1 2 3 caches ax+b (mod 4) = 2

Problem: Inconsistent Views LCS John’s view 1 2 3 caches ax+b (mod 4) = 2

Problem: Inconsistent Views LCS Dave’s view 1 2 3 caches ax+b (mod 4) = 2

Problem: Inconsistent Views LCS Al’s view 1 2 3 caches ax+b (mod 4) = 2

Problem: Inconsistent Views LCS Mike’s view 1 2 3 caches ax+b (mod 4) = 2

Problem: Inconsistent Views LCS caches 2 2 2 2 2

Consistent Hashing A new kind of hash function Maps any item to a bucket in my view Computable in constant time, locally 1 standard hash function Adding bucket to view takes logarithmic time logarithmic number of standard hash functions Handles incremental and inconsistent views

Single View Properties Balance: all buckets get roughly same number of items (like standard hashing) Smooth: when an kth bucket is added, only a 1/k fraction of the items move and only from O(log n) servers minimum needed to preserve balance

Multiple View Properties Consider n views, each of an arbitrary constant fraction of the buckets Load: number of items a bucket gets from all views is O(log n) times average Despite views, load balanced Spread: over all views, each item appears in O(log n) buckets Despite views, few caches for each item

Implementation Use standard hash function H to map buckets, items to unit interval “random” points spread uniformly Item Bucket

Implementation Use standard hash function H to map buckets, items to unit interval “random” points spread uniformly Item assigned to nearest bucket Item Bucket

Computation Cost Bucket positions precomputed To hash item compute H find nearest bucket point O(log n) time using binary search Constant time with auxiliary hash table

Balance Bucket points uniformly distributed by H Each bucket “owns” equal portion of line Item position “random” by H So item equally likely to be at any bucket So all buckets get about same # items Bucket

Smoothness To add a kth bucket, hash it to line Old bucket

Smoothness To add a kth bucket, hash it to line Old bucket New bucket

Smoothness To add a kth bucket, hash it to line Captures items nearest to it only 1/k fraction of total items only from 2 other buckets (on each side) Old bucket New bucket

Low Spread Some views might not see nearest bucket to an item, hash it elsewhere But every view will have a bucket near the item (by “random” placement) So only buckets near the item will ever have to hold it But only a few buckets are near the item (by “random” placement)

Low Load Bucket only gets item I if no other bucket is closer to I Under any view, some bucket is close to I by “random” placement of buckets So a bucket only gets items close to it But an item is unlikely to be close So bucket doesn’t get many items

Summary: Consistent Hashing Trivial to implement (20 lines code) don’t try this at home! Fast to compute (no hidden constants) Uniformly distributes items Can cheaply add/remove buckets Even with multiple views no bucket gets many items each item in only a few buckets

Caching Consistent hashing good for caching client maps known caches to unit interval when item arrives, hash to cache server gets O(log n) requests for its own pages Each server can also be a cache gets small number of requests for others’ pages Consistent hashing is robust caches can come and go different browsers can know different caches

Refinements Browser bandwidth to machines may vary If bandwidth to server high, unwilling to use low bandwidth cache Consistently hash item only to caches with bandwidth as good as server Theorem: all previous properties still hold uniform cache loads low server loads (few caches per item)

Fault Tolerance Suppose contacted cache is down Delete from bucket set, find next closest bucket in consistent hashing interval Just a small change in view… Even with many failures, previous properties (uniform low load, etc) still hold.

Hot pages What if one page gets popular? cache responsible for it gets swamped Use tree of caches (e.g. harvest, DNS)? cache at root gets swamped Use a different tree for each page build using consistent hashing Balances load for hot pages and servers

Main Result Using cache trees of logarithmic depth, for any set of page accesses, can adaptively balance load such that every server gets at most log times average load of system (browser/server ratio). (assorted theory caveats)

Conclusion Consistent hashing balances load Simple enough to be practical (?) We’d like to build distributed web cache need help of good systems student Looking for other applications DNS lookup? PICS? URN resolution? NOWs? Distributed file systems?