Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Cache Location Problem. Overview TERCs Vs. Proxies Stability Cache location.

Similar presentations


Presentation on theme: "The Cache Location Problem. Overview TERCs Vs. Proxies Stability Cache location."— Presentation transcript:

1 The Cache Location Problem

2 Overview TERCs Vs. Proxies Stability Cache location

3 Proxy Web Caching is Good Saves network bandwidth Reduces delay Reduces server’s load But it is not perfect: – –not everybody uses it (configuration) – –may become a bottleneck and increase delay – –increases delay for unsatisfied pages

4 Caches are located along routes from clients to servers, and are transparent to both server and client Requests are intercepted by the TERC on their way to the server, and either answered by the cache if the information exists otherwise, forwarded to the server Advantages: No configuration required! No management! No change required in current network infrastructure Can be deployed independently within an ISP subnetwork Transparent En-Route Caches (TERCs)

5 TERCs (-) Must be on the route from client to server: – –sensitive to route changes – –hierarchies are much harder to implemen Needs to intercept traffic: – –implementation problem – –more complex – –can TERCs work at line speed? Depends on routing stability, and flow stability Where should TERCs be placed?

6 Route Stability Published results indicate that routing is stable (Paxon, Labovitz) We need stability only during the connection lifetime (~1 min.): – –[KRS00] measurements to more that 13000 destinations show that >93% of connections were stable – –real numbers are probably higher TCP route caching equivalent of IP addresses

7 Stability of Flows We built the flow tree from servers: Data from Bell-Labs servers ( www.bell- labs.com, www.multimedia.bell- labs.com ) – –Nov. 97 - Jan. 98 – –~14000 different hosts, 1 Gbytes, ~200k cachable requests (per week) From log files to results: – –extract unique host – –run traceroute for each host – –obtain the routing tree (or is it DAG?)

8 Stability - Visual

9 Client return rate between days day0111011201130114011501160117113012011202120312041205 1206 0111 4.3543.783.693.553.733.253.123.212.963.012.79 3.36 01124.35 6.936.065.665.343.582.774.43.853.863.874.02 3.33 011346.93 7.486.16.124.263.284.584.254.164.344.25 2.96 01143.786.067.48 7.336.484.073.034.214.234.284.344.25 3.15 01153.695.666.17.33 7.414.32.773.714.024.253.984.2 2.88 01163.555.346.126.487.41 5.383.134.214.564.124.14.36 3.25 01173.733.584.264.074.35.38 3.362.993.142.862.883.18 3.46 11303.252.773.283.032.773.133.36 4.324.084.153.423.49 4.23 12013.124.44.584.213.714.212.994.32 76.346.064.97 3.58 12023.213.854.254.234.024.563.144.087 6.885.895.35 3.94 12032.963.864.164.284.254.122.864.156.346.88 7.015.58 3.48 12043.013.874.34 3.984.12.883.426.065.897.01 7.15 3.95 12052.794.024.25 4.24.363.183.494.975.355.587.15 4.82 12063.363.332.963.152.883.253.464.233.583.943.483.954.82

10 Stability (3) The relative flow in the tree is stable in time, although the client population changes significantly Routing is stable for the lifetime of the connection Placing caches based on past traffic yields good results

11 How Fixed is the Hit Ratio?

12 How Fixed is the Hit Ratio?(2)

13 Where Should the TERCs be Placed?

14 The Model Wide area network Requests are represented by a set of demands (of client i from server j) Goal: minimize average delay = minimize total flow The hit ratio (P) abstracts cache behavior most hits due to small number of popular pages full dependency - the same pages are cached everywhere But part of the flow can come from Proxies Each flow is associated with a hit ratio P i,j =>

15 The General k-cache Location Problem Instance: an undirected graph G=(V,E) a set of demands F={f i,j } a set of hit ratios P={p i,j } k - the number of caches Solution: K, a subset of V of size k Objective: minimizing total flow min f i,j [p i,j d(i,v) + (1-p i,j ) (d(i,v)+d(v,j))] i,j  v  K+{j}

16 The k-TERC Location Problem v  K+{j} on the path from j to i min f i,j [p i,j d(i,v) + (1-p i,j ) (d(i,v)+d(v,j))] i,j  Instance: an undirected graph G=(V,E) a set of demands F={f i,j } a set of hit ratios P={p i,j } k - the number of caches Solution: K, a subset of V of size k Objective: minimizing total flow

17 Remarks A generalization of the p-median problem (in the p-median problem we want to minimize the total cost of serving a set of demands from at most p centers) In the k-TERC location problem: – –it is enough to solve the problem for fixed p (p i,j = p) – –The optimal set K does not depend on p. – –(not true in general) The k-TERC location problem is a special case of the general k-location problem (p=1/n)

18 The independence of p s,c TERC constant

19 Hardness Results linetreegeneral graph one server m servers Poly. NP - hard

20 Topology: a line of n nodes Every node may be a server, a client, or both. FR(i) – The flow demand on the segment (i-1,i) FR can be easily computed from the input. FC(i,l o,l i ) - The flow on the segment (i-1,i) when the closest caches to i are in l o and l i. FC can be computed from the input with p=1. Note: FR(i) = FC(i,n-1,0) Placement on a line 012 n-1

21 Placement on a line C(j,l o,l i,k) the overall flow in segment [0,j] when k caches are locate optimally inside the segment, and the closest caches to j are in l o and l i.

22 The dynamic Program Base case (j=1)Base case (j=1) For j>1:For j>1:

23 The Algorithm 1.Compute C(1,l i,1,1) and C(1,l i,0,0) for 1≤l i ≤n-1 2.For each j>1 compute C(j,l o,l i,k’) for all 0≤k’≤k and 0≤l i ≤j≤l o ≤n-1 Complexity: O(n 3 k)

24 Optimizing for a single server The routes from the server to all clients form a tree (actually a DAG)The routes from the server to all clients form a tree (actually a DAG) We’ll use dynamic programing to find the optimal cache locationsWe’ll use dynamic programing to find the optimal cache locations

25 The Greedy Algorithm Optimal algorithm using a bottom up dynamic programming: – not trivial – complexity O(n k 2 h) Greedy: –repeat k times {find the best cache location} – complexity O(n k) How bad can it be? How bad can it be?

26 Greedy Vs. Optimal

27 Dynamic Programming for Tree First we convert the tree to a binary tree by adding dummy nodes. Sort all nodes in reverse BFS order: nodes descendents are numbered before the node itself. Children of node i are: i R and i L

28 Notations C(i,k’,l) is the cost of a subtree rooted at i with k’ optimally located caches, where the next cache up the tree is at distance l from i. F(i,k’,l) is the sum of demands in the subtree i that do not pass thru a cache in the solution C(i,k’,l).

29 The Dynamic Program

30 The DP Formula for C(i,k,l) The cost if a cache is not placed at node i: The cost if a cache is placed at node i: Complexity: O(n·h·k) variables  O(n·h·k 2 ) time cmplx Finer analysis yields O(n·h·k) time complexity

31 The Server’s Point of View

32 Traffic Reduction

33 TERCs Vs. Edge Caches

34 The Server’s Point of View (2)

35 Popularity Stability


Download ppt "The Cache Location Problem. Overview TERCs Vs. Proxies Stability Cache location."

Similar presentations


Ads by Google