Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA 90089.

Similar presentations


Presentation on theme: "CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA 90089."— Presentation transcript:

1 CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA 90089

2 QUIZ 1: 1. When you register your email with Google, Google emails you a key that must be included with each request to their web methods. (True) 2. The Google web service API can be invoked using ASP.NET (True) 3. Once a client caches the results of a Google search, Google will invalidate this client’s cache when it detects updates to its information system. (False) 4. Napster employed a central server to store the index of all files available for download by a client. (True) 5. CAN assumes nodes that insert (key,value) pairs will periodically refresh their inserted entries. (True)

3 A Scalable Content Addressable Network (CAN) by S. Ratnsmy, P. Francis, M. Handley, R. Karp, S. Shenker.

4 CAN CAN is composed of individual nodes. CAN is composed of individual nodes. CAN employs a hash function to insert, lookup, and delete (key,value) pairs. CAN employs a hash function to insert, lookup, and delete (key,value) pairs. A node stores a chunk, termed a zone, of the entire hash table. A node stores a chunk, termed a zone, of the entire hash table. A node maintains information about its neighboring nodes. A node maintains information about its neighboring nodes.

5 EXAMPLE HASH FUNCTION A two dimensional hash function: A two dimensional hash function: h(K) = a 6 bit unsigned integer h(K) = a 6 bit unsigned integer The low three and high three bits form the 2 dimensions of a hash index. The low three and high three bits form the 2 dimensions of a hash index.  e.g., h(“Thriller”) = 111011  Low 3 bits = 011  High 3 bits = 111 Three bits range in value from 0 (000) to 7 (111) Three bits range in value from 0 (000) to 7 (111)

6 ADDRESS SPACE A 2 dimensional address space, can be partitioned across 64 nodes A 2 dimensional address space, can be partitioned across 64 nodes 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

7 EXAMPLE A 2 dimensional space partitioned across six nodes A 2 dimensional space partitioned across six nodes 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

8 EXAMPLE (CONT…) h(“Thriller”) = 111011 = (111, 011) = node 6 h(“Thriller”) = 111011 = (111, 011) = node 6 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

9 NEIGHBORS Two nodes are neighbors if their coordinate spans overlap along d-1 dimensions and about along one dimension. Two nodes are neighbors if their coordinate spans overlap along d-1 dimensions and about along one dimension. 1 2 43 65

10 NEIGHBORS (CONT…) 3’s neighbors: 3’s neighbors: 1 2 4 3 6 5

11 NEIGHBORS (CONT…) 5 is not 3’s neighbor because it does not overlap along one dimension; it only abuts along two dimensions. 5 is not 3’s neighbor because it does not overlap along one dimension; it only abuts along two dimensions. 1 2 43 65

12 NEIGHBORS (CONT…) The coordinate space is a d-torus, it wraps. Example, 5’s neighbors: The coordinate space is a d-torus, it wraps. Example, 5’s neighbors: 1 2 4 3 6 5

13 NEIGHBORS (CONT…) A node maintains information about its neighbors in order to route a lookup, insert, and delete: A node maintains information about its neighbors in order to route a lookup, insert, and delete: 1 2 43 65

14 NODE ADDRESSING CAN has an associated DNS domain name that resolves to the IP address of one or more CAN bootstrap nodes. CAN has an associated DNS domain name that resolves to the IP address of one or more CAN bootstrap nodes. A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system. A bootstrap node maintains a partial list of CAN nodes it believes are currently in the system. A request is routed to one of these nodes. A request is routed to one of these nodes. The contacted node applies the hash function and routes the request towards its target destination (using information about its neighbors). The contacted node applies the hash function and routes the request towards its target destination (using information about its neighbors).

15 EXAMPLE A client looks up “Fragile”, h(“Fragile”) = 100010, (4,2) by contacting N5 (7,0). A client looks up “Fragile”, h(“Fragile”) = 100010, (4,2) by contacting N5 (7,0). Reduce the y-value (high-bits) from 7 to 4, Increase x-value (low bits) from 0 to 2 Reduce the y-value (high-bits) from 7 to 4, Increase x-value (low bits) from 0 to 2 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

16 EXAMPLE A client looks up “Fragile”, h(“Fragile”) = 100010, by contacting N5. A client looks up “Fragile”, h(“Fragile”) = 100010, by contacting N5. 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

17 EXAMPLE A client looks up “Fragile”, h(“Fragile”) = 100010, by contacting N5. A client looks up “Fragile”, h(“Fragile”) = 100010, by contacting N5. 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

18 EXAMPLE A client looks up “Hey bebe!”, h(“Hey bebe!”) = 110101, by contacting N5; how is the request routed? A client looks up “Hey bebe!”, h(“Hey bebe!”) = 110101, by contacting N5; how is the request routed? 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

19 OBSERVATIONS Observation 1: Observation 1:  In a d-dimensional space, each node has 2d neighbors.  A node maintains information about its neighbors.  Thus, one may grow the number of nodes without increasing the node state. Observation 2: Observation 2:  The average path length grows as O(n 1/d ) as a function of the number of nodes, n. Observation 3: Observation 3:  The path length is O(d n 1/d ) hops for d dimensions and n nodes

20 NUMBER OF DIMENSIONS Figure 4 Figure 4  Substantial improvement with going from d=2 to 4. Beyond 4, the percentage improvement levels off.  This same observation is shown in Figure 6.

21 NEW NODE CAN incorporates a new node, say N7, as follows: CAN incorporates a new node, say N7, as follows:  N7 must find a CAN node.  N7 randomly chooses a point P that maps to a node, say N1, and sends it a join request. N7’s zone will be partitioned between N7 and N1.  N1 splits is zone in half, retains one half and handles the other half to N7.  N7 identifies its neighbors.  Neighbors of N1 are notified to include N7 for routing.

22 NEW NODE (Cont…) Zone belonging to N1 is partitioned between N1 and N7 Zone belonging to N1 is partitioned between N1 and N7 1 2 43 65 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits 7

23 Questions & Answers

24 NODE REMOVAL (FAILURE)

25 IMPROVEMENTS Categories of improvements: Categories of improvements:  Replication: reduce path length  Multiple realities: additional state information (Sec 3.2)  Multiple hash functions (Sec 3.5)  MAX replica based: additional state information (Sec 3.4)  Routing of requests: reduce path latency  Route requests to a candidate neighbor with minimum RTT: additional state information (3.3)  Assignment of nodes to zones  Uniform partitioning of space (3.7): load balancing  Topologically close nodes are assigned to the same zone (3.6): reduce path latency

26 IMPROVEMENTS A matrix perspective (missing: data caching) A matrix perspective (missing: data caching) One may consider a combination, e.g., data placement & replication One may consider a combination, e.g., data placement & replication Reduce path length Reduce path latency Load balancing Replication Better Routing Data placement

27 REPLICATION: MULTIPLE REALITIES Maintain multiple, independent coordinate spaces. Maintain multiple, independent coordinate spaces. Each node is assigned a different zone in each coordinate space. Each node is assigned a different zone in each coordinate space. Here are two realities: Here are two realities: 1 2 43 6512 4 3 65 Reality-1Reality-2

28 MULTIPLE REALITIES Replication increases availability of data in the presence of failures Replication increases availability of data in the presence of failures Figure 5 Figure 5  The benefit (percentage improvement) with 2 and 3 realities is substantial. It levels off with 4 or more realities. Figure 6 Figure 6  Number of neighbors is fixed on the x-axis with both (a) d=2&r=varying, and (b) d=varying&r=2  To improve routing efficiency, multiple dimensions is more beneficial than increasing the number of realities (given the same amount of space).  Qualitatively: additional realities provide a higher degree of data availability (in the presence of failures).  Notice the knee of both curves in Figure 6 (impact of realities & dimensions is marginal beyond a certain point).

29 REPLICATION: MULTIPLE HASH FUNCTION Use k different hash functions to map a single key to k different points in the coordinate space. Use k different hash functions to map a single key to k different points in the coordinate space. This results in 0 to k replicas of a single key. In case of collisions to a single zone do not construct replicas. This results in 0 to k replicas of a single key. In case of collisions to a single zone do not construct replicas. With a lookup, retrieve the entry from the closest node. (Retrieve the node from all the k potential targets, consuming more bandwidth.) With a lookup, retrieve the entry from the closest node. (Retrieve the node from all the k potential targets, consuming more bandwidth.) Figure 7 Figure 7

30 CONTROLLED REPLICATION Multiple nodes share the same zone. Multiple nodes share the same zone.  These nodes are termed peers. MAXPEERS is a system parameter to control the number of replicas. MAXPEERS is a system parameter to control the number of replicas. Logically, a node has 2d(MAXPEERS) neighbors Logically, a node has 2d(MAXPEERS) neighbors To maintain a fixed amount of state information per node: To maintain a fixed amount of state information per node:  A node selects one neighbor from amongst the peers in each of its neighboring zones.

31 CONTROLLED REPLICATION Neighbor selection: Neighbor selection:  Periodically, a node request its coordinate neighbor to transmit its peer list  Measure the RTT to all nodes in its neighboring zone  Retain the node with the lowest RTT as its neighbor Replication: Replication:  Increases data availability  Improves performance: path length, path latency, and load balancing  Increases update overhead

32 CONTROLLED REPLICATION Table 2: per-hop latency Table 2: per-hop latency MAXPEERS Relative % Improvement

33 Questions & Answers

34 BETER ROUTING Standard routing metric: Progress towards the destintion in terms of the Cartesian distance. Standard routing metric: Progress towards the destintion in terms of the Cartesian distance. Better routing: Better routing:  Each node measures the network-level round-trip-time (RTT) to each of its neighbors  For a given destination, a message is forwarded to the neighbor with the max(progress / RTT) Table 1 (20-40% improvement) Table 1 (20-40% improvement)

35 Questions & Answers

36 Topologically sensitive routing Assign zones to nodes in a manner that assigns neighboring zones to nodes that have a minimum RTT Assign zones to nodes in a manner that assigns neighboring zones to nodes that have a minimum RTT  USC’s neighboring node should be UCLA (instead of Cornell) assuming the RTT to UCLA is smaller. How? How?  Identify m well known set of machines as landmarks.  Every CAN measures its RTT to these machines and maintains a vector listing closest to farthest.  m! ordering of landmarks is possible  Partition the coordinate space into m! portions  When a new node joints, it is mapped to a portion with a matching landmark ordering

37 Topologically sensitive routing Assuming 3 landmarks, the Cartesian space is divided into six portions: m splits along x-axis, m-1 splits along the y-axis Assuming 3 landmarks, the Cartesian space is divided into six portions: m splits along x-axis, m-1 splits along the y-axis 000 001 010 011 100 101 110 111 000001010011100101110111 High bits Low bits

38 Topologically sensitive routing Figure 8: Figure 8:  4 landmarks with a minimum hop distance of 5  Latency stretch = CAN network latency  average IP network latency  (2-d with landmark ordering) out performs (4-d without landmark ordering)

39 Questions & Answers


Download ppt "CSCI 599: Beyond Web Browsers Professor Shahram Ghandeharizadeh Computer Science Department Los Angeles, CA 90089."

Similar presentations


Ads by Google