Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 545 – Fundamentals of Stream Processing – Consistent Hashing

Similar presentations


Presentation on theme: "CS 545 – Fundamentals of Stream Processing – Consistent Hashing"— Presentation transcript:

1 CS 545 – Fundamentals of Stream Processing – Consistent Hashing
Buğra Gedik CS Fundamentals of Stream Processing

2 CS 545 - Fundamentals of Stream Processing
Hashing Common operation used in splitting flows Fission optimization relies on splitting Partitioned stateful parallelism requires sending tuples with the same partitioning attribute value to the same parallel channel Load balancing relies on splitting Same amount of flow should be sent to each one of the parallel channels Simple solution to achieve both hash(partitioningAttribute) % numChannels CS Fundamentals of Stream Processing

3 CS 545 - Fundamentals of Stream Processing
Illustration N0 “B” “Z” N0 “U” “B” “p” H(“U”) → 28 H(“B”) → 12 H(“G”) → 35 H(“Z”) → 3 H(“p”) → 10 H(“k”) → 5 N1 “U” “p” %2 %3 N1 “G” “Z” “k” N2 “G” “k” Cons: More than ideal number of migrations (5 instead of 2) Results in unnecessary state transfers for elastic fission Moves between existing nodes is the culprit Pros: Efficient computation: O(1) time, no memory used CS Fundamentals of Stream Processing

4 CS 545 - Fundamentals of Stream Processing
Requirements Balance Each node should get assigned approx. the same number of items Migration When moving from N to N+1 nodes We should migrate approx. 1/(N+1) fraction of the items Fast computation The time to find the node should be small, ideally O(1) or O(log N) Reasonable memory consumption CS Fundamentals of Stream Processing

5 Two Hashing Algorithms
Rendezvous Hashing Thaler, David; Chinya Ravishankar (February 1998). "Using Name-Based Mapping Schemes to Increase Hit Rates". IEEE/ACM TON, 6(1): 1–14. doi: / Consistent Hashing Karger, D.; Lehman, E.; Leighton, T.; Panigrahy, R.; Levine, M.; Lewin, D. (1997). "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web". ACM STOC. pp. 654–663, doi: / CS Fundamentals of Stream Processing

6 CS 545 - Fundamentals of Stream Processing
Rendezvous Hashing Given an item I For each node i Compute Hi = Hash(I ⊕ i) Return h = argmaxi Hi Intuition When the (N+1)-th node, say n, is joined, for an item, say I, already assigned to one of the nodes, the probability that Hash(I ⊕ n) is greater than the previous highest value is 1/(N+1), since the Hash function is uniform On average 1/(N+1) fraction of the items will move Also, there will never be moves across existing nodes CS Fundamentals of Stream Processing

7 CS 545 - Fundamentals of Stream Processing
The Hash Function The Hash function is Deterministic Uniform (all output values are equally likely) Avoids collisions as much as possible In my Java implementation, I used: Form a 16-byte sequence: hashCode(I), -hashCode(I), -hashCode(n), hashCode(n) Use MD5 hash on this (128-bit result) Convert the result to BigInteger Return the hashCode of the BigInteger CS Fundamentals of Stream Processing

8 CS 545 - Fundamentals of Stream Processing
Illustration H(“U”,N0) → 27 H(“B”,N0) → 24 H(“G”,N0) → 34 H(“Z”,N0) → 18 H(“p”,N0) → 30 H(“k”,N0) → 5 N0 “U” “Z” “p” N0 “U” “p” N1 N1 “B” “G” “k” “B” “G” H(“U”,N1) → 25 H(“B”,N1) → 49 H(“G”,N1) → 42 H(“Z”,N1) → 17 H(“p”,N1) → 2 H(“k”,N1) → 13 N2 “Z” “k” H(“U”,N2) → 7 H(“B”,N2) → 18 H(“G”,N2) → 27 H(“Z”,N2) → 29 H(“p”,N2) → 8 H(“k”,N2) → 40 CS Fundamentals of Stream Processing

9 CS 545 - Fundamentals of Stream Processing
Discussion The migration and load balancing properties are ideal Each node gets around 1/Nth of the items When the (N+1)-th node joins, around 1/(N+1) fraction of the items migrate No memory needs to be kept The time to compute the hash is O(N) This might become problematic when N is large Good quality hash functions are usually costly In stream processing, per-tuple processing cost is critical 0.1 millisec means you cannot go over 10K tuples/sec CS Fundamentals of Stream Processing

10 CS 545 - Fundamentals of Stream Processing
Consistent Hashing Consists of two parts Building the hash function for a given number of nodes Computing the hash value for a given item Building the hash function First intuition (partitioning) Take a unit circle and partition it among the N nodes i.e. each node owns a region on the unit circle Items will be mapped to points on the unit circle as well The node that owns the region containing the point corresponding to an item would be the hash value of that item We need two things for this to work: The partitioning should stay mostly stable, as the number of nodes increases or decreases The sizes of the regions assigned to nodes should be balanced CS Fundamentals of Stream Processing

11 Consistent Hashing - continued
Building the hash function (continued) Second intuition (hashing) To avoid the partitioning to change drastically as the number of nodes changes, we hash the nodes to determine their locations on the unit circle Typically a 128-bit unit circle is used for hashing Each node owns the part of the unit circle up to the next node's location (clockwise) N0 All items hashing to this area are assigned to N1 All items hashing to this area are assigned to N0 N1 CS Fundamentals of Stream Processing

12 Consistent Hashing - continued
The hashing idea helps with having stable partitions When a node is inserted, it gets a sub-region from another node When a node is removed, it gives its region to another node However, there is a problem with the balance requirement How do we make sure that each node gets a similar sized region? Consider the following toy problem: You have a gold circle and you want to divide it equally among yourself and your best friend There is a machine that can cut the circle into k random pieces Obviously, if you ask the machine to cut it into 2 pieces, the result might be quite unfair to one of you You want to minimize the probability of such unfair partitioning An idea that most of us would come up with would be to instruct the machine to cut the gold circle into large number of pieces and then we can distribute the pieces among the two of us CS Fundamentals of Stream Processing

13 Consistent Hashing - continued
Building the hash function (continued) Third intuition (virtual nodes) Assign R virtual nodes to each node on the unit circle i.e., each node will own multiple regions on the circle N0's virtual nodes N1's virtual nodes N2's virtual nodes CS Fundamentals of Stream Processing

14 Consistent Hashing - continued
Computing the hash function Hash the item to a location on the unit ring Return the node whose virtual node owns that location Properties Each node gets around 1/N fraction of the items When (N+1)-th node joins, around 1/(N+1) fraction of items migrate Requires N * R memory R is the number of virtual nodes per node Typically at least 100 or 200 The time to compute the hash is O(log(N)) The virtual nodes are kept in a sorted binary tree Can be reduced to O(1) by keeping N equal sized buckets over the ring and having separate trees for each one of the buckets CS Fundamentals of Stream Processing

15 CS 545 - Fundamentals of Stream Processing
Experiments - Balance itemCount domainSize numNodes 16 CS Fundamentals of Stream Processing

16 Experiments - Migration
CS Fundamentals of Stream Processing

17 Experiments – Evaluation Cost
CS Fundamentals of Stream Processing

18 CS 545 - Fundamentals of Stream Processing
? ! CS Fundamentals of Stream Processing


Download ppt "CS 545 – Fundamentals of Stream Processing – Consistent Hashing"

Similar presentations


Ads by Google