Presentation is loading. Please wait.

Presentation is loading. Please wait.

School of Computing Clemson University Fall, 2012

Similar presentations


Presentation on theme: "School of Computing Clemson University Fall, 2012"— Presentation transcript:

1 School of Computing Clemson University Fall, 2012
Lecture 6. Hashing Applications III CpSc 212: Algorithms and Data Structures Brian C. Dean School of Computing Clemson University Fall, 2012

2 Hashing in Networking:
Routing Tables A router is a computer that forwards incoming packets along the appropriate output port based on an internal “routing table”. Routers need to operate quickly – packets must be forwarded as fast as possible… Routing table: → port 3 → port 2 Incoming packets from network Output ports

3 Hashing in Networking:
Routing Tables Routing tables also contain rules that apply to entire blocks of destination IP addresses. Example: “1.2.*.*/16” stands for the block of IP addresses with 1.2 as their initial 16 bits. Multiple rules can now apply to an incoming packet Routing table: /16 → port 3 /24 → port 2 Incoming packets from network Output ports

4 Hashing in Networking:
Routing Tables Routing tables also contain rules that apply to entire blocks of destination IP addresses. Example: “ /16” stands for the block of IP addresses with 1.2 as their initial 16 bits. Multiple rules can now apply to an incoming packet; most specific rule should be used. Routing table: /16 → port 3 /24 → port 2 Packet with destination Output ports

5 Aside: Bit Fiddling An IP address is stored in a 4-byte (32-bit) unsigned integer. E.g. “ ” is the integer , stored in binary as: A = We often want to get / set / toggle individual runs of bits in a binary number like this. E.g., “set the 3rd ‘octet’ to 5 instead of 3”. “1” “2” “3” “4”

6 Aside: Bit Fiddling A & B computes the bitwise “AND” of A and B:
A>>x shifts A right by x bits (divide by 2x) Example: extract the 3rd octet of A: (A&B)>>8 = [ …zeros… ] Similarly, A<<x shifts A left by x bits (which multiplies by 2x)

7 Aside: Bit Fiddling A | B computes the bitwise “OR” of A and B:
A ^ B computes the bitwise “XOR” of A and B: ~A is the bitwise complement of A (toggles all bits from 0 to 1 and vice-versa). How do we zero out the 3rd octet of A? A = A & (~(255 << 8)); How do we set the 3rd octet of A to a new value x? A = (A & (~(255 << 8))) | (x << 8)

8 XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a.

9 XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b;

10 XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b; Simple problem: given an integer array A[1…N], all the numbers in A except one occur an even number of times. Find the number appearing an odd number of times.

11 XOR Tricks XOR is a wonderful operation, since applying a second time cancels out the first application: (a XOR b) XOR b = a. Swap two integers without using a temporary variable: a = a^b; b = a^b; a = a^b; Simple problem: given an integer array A[1…N], all the numbers in A except one occur an even number of times. Find the number appearing an odd number of times. Solution: XOR(A[1] … A[N])

12 Another Nice Problem Alice and Betty are sitting across from each-other. Each is wearing a hat bearing the number 0 or 1. (and the numbers could be the same). They can’t see their own hats. Alice and Betty write down guesses A and B for their own number. If it turns out that A or B is correct, they win! Is there a strategy that is guaranteed to win?

13 Load Balancing Large websites often consist of multiple severs sitting behind a router. We would like to balance the load assigned to these servers… Server 1 Server 2 Incoming packets from network Router Server 3 Server 4

14 Load Balancing Random assignment would balance the load.
However, it doesn’t consistently assign the same incoming source IP to the same server (useful if servers maintain “shopping carts” or other state). Good alternative solution: map packet with source IP address A to server h(A). Router Server 1 Server 2 Server 4 Server 3 Incoming packets from network

15 Consistent Hashing: Motivation
Ok, so we are mapping packets with source IP address A to server h(A). What if servers fail unexpectedly, or are removed / added to the website. How can we update our assignments now? Router Server 1 Server 2 Server 4 Server 3 Incoming packets from network

16 Consistent Hashing: One Approach
Hash source IPs AND servers to a circle. I.e., instead of mapping keys → table cells, map both the keys and cells to a common space! Server 1 Each IP address is now assigned to the next server in clockwise rotation from it on the circle. Does this fix the problem of servers failing? Server 3 Server 4 Server 2

17 Consistent Hashing: One Approach
Why not map several instances of each server to the circle… Now load from a failing server is spread more uniformly across the other servers. So we have consistency in our assignments as well as some fault tolerance. Server 2 Server 1 Server 1 Server 1 Server 1 Server 3

18 Distributed Hash Tables (DHTs)
What if we want to store a huge set of (key, value) pairs using a distributed network so that we can still perform insert, remove, and find? We can’t just replicate the entire table across every server in the network. We would ideally like a decentralized solution, which does not depend on a small set of “root” servers that effectively know the address of the server on which every object is stored.

19 Distributed Hash Tables: Birthday Paradox Solution?
Suppose there are N total servers. What if we store each object in a hash table on only X servers, chosen randomly. To look up an object, we pick a random set of X servers and query them all. What is a good choice for X?

20 The “Chord” DHT Hash object keys and severs to a circle.
Objects stored on the next clockwise server. Problem: there is no centralized “root” server that sees this entire picture, and can tell us our next clockwise server. We instead must be able to initiate an operation (insert, remove, find) by contacting an arbitrary server… Server 1 Server 3 Key 3 Key 1 Key 2 Server 4 Server 2

21 The “Chord” DHT Each server tracks the next server after it along the circle (actually, the next few, for fault tolerance purposes). Now, starting from any server, we can walk around the circle, server by server, until we stop at the first server whose hash is just past the hash of the key of the object we want to find. However, this can be slow if there are many servers (just like a linked list is slow…) How might we speed it up? Server 1 Server 3 Key 3 Key 1 Key 2 Server 4 Server 2

22 The “Chord” DHT Use longer-range links, like in a skip list!
Each node points to a collection of servers around the circle… If each server maintains O(log N) links to successors at distances 1, 2, 4, 8, etc., then we can reach our final destination with only O(log N) hops! Server 1 “+1” “+8” Key 1 “+2” Key 2 “+4”

23 Hashing as a Segway Towards
Machine Learning… h( ) = ?

24 Hashing as a Segway Towards
Machine Learning…

25 Distilling Objects to Simpler
“Feature Vectors”…

26 Distilling Objects to Simpler
“Feature Vectors”… Unclassified object

27 Distilling Objects to Simpler
“Feature Vectors”… Unclassified object


Download ppt "School of Computing Clemson University Fall, 2012"

Similar presentations


Ads by Google