Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.

Similar presentations


Presentation on theme: "Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research."— Presentation transcript:

1 Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research

2 Overview n Analytical tools have evolved to predict behavior of large-scale Web caches. u Are results from existing large-scale caches consistent with the predictions? F NLANR u What do the models predict for Content Distribution/Delivery Networks (CDNs)? n Goal: answer these questions by extending models to predict interior cache behavior.

3 Generalized Cache/CDN (External View) {request, reply} Origin Servers Clients {push, request, reply} CDNs Web Caches

4 Generalized Cache/CDN (Internal View) Leaf Caches Interior Caches root caches reverse proxies Request Routing Function ƒ bound client populations ƒ

5 Goals and Limitations n Focus on interior cache behavior. u Assume leaf caches are ubiquitous. u Model CDNs as interior caches. n Focus on hit ratio (percentage of accesses absorbed by the “cloud”). u Ignore push replication; at best it merely reduces some latencies by moving data earlier. n Focus on “typical” static Web objects. u Ignore streaming media and dynamic content.

6 Outline n Analytical model u applied to interior nodes of cache hierarchies u applied to CDNs n Implications of the model for CDNs in the presence of ubiquitous leaf caching n Match model with observations from the NLANR cache hierarchy n Conclusion

7 Analytical Model n [Wolman/Voelker/Levy et. al., SOSP 1999] u refines [Breslau/Cao et. al., 1999], and others n Approximates asymptotic cache behavior assuming Zipf-like object popularity u caches have sufficient capacity n Parameters: u = per-client request rate u  = rate of object change u p c = percentage of objects that are cacheable u  = Zipf parameter (object popularity)

8 Cacheable Hit Ratio: the Formula n C N is the hit ratio for cacheable objects achievable by population of size N with a universe of n objects. [Wolman/Voelker/Levy et. al., SOSP 99] N

9 Inside the Hit Ratio Formula Approximates a sum over a universe of n objects......of the probability of access to each object x... …times the probability x was accessed since its last change. C is just a normalizing constant for the Zipf-like popularity distribution (a PDF). C = 1/  in [Breslau/Cao 99] 0 <  < 1 N

10 Level 2 Level 1 (Root) N 2 clients N 1 clients An Idealized Hierarchy Assume the trees are symmetric to simplify the math. Ignore individual caches and solve for each level.

11 Hit Ratio at Interior Level i n C N gives us the hit ratio for a complete subtree covering population N n The hit ratio predicted at level i or at any cache in level i is given by: “the hits for N i (at level i) minus the hits captured by level i+1, over the miss stream from level i+1”

12 Root Hit Ratio n Predicted hit ratio for cacheable objects, observed at root of a two-level cache hierarchy (i.e. where r 2 =Rp c ):

13 N L clients N clients Generalizing to CDNs Request Routing Function Interior Caches (supply side) N I clients ƒ(leaf, object, state) Leaf Caches (demand side) N L clients Symmetry assumption: ƒ is stable and “balanced”. ƒ

14 CDN1CDN2 Servers Leaf Caches Interior Caches

15 Servers Leaf Caches Interior Caches N I clients

16 Servers Leaf Caches What happens to C N if we partition the object universe?

17 Servers Leaf Caches

18 Servers Leaf Caches

19 Servers Leaf Caches

20 Servers Leaf Caches

21 CDN1CDN2 Servers Leaf Caches

22 Hit ratio in CDN caches n Given the symmetry and balance assumptions, the cacheable hit ratio at the interior (CDN) nodes is: N I is the covered population at each CDN cache. N L is the population at each leaf cache.

23 Analysis n We apply the model to gain insight into interior cache behavior with:  varying leaf cache populations ( N L ) F e.g., bigger leaf caches  varying ratio of interior to leaf cache populations ( N I /N L ) F e.g., more specialized interior caches u Zipf  parameter changes F e.g., more concentrated popularity

24 Analysis (cont’d) n Fixed parameters (unless noted otherwise): u (client request rate) = 590 reqs./day u  (rate of object change) = F once every 14 days (popular objects, 0.3%) F once every 186 days (unpopular objects) u p c (percent of requests cacheable) = 60% u  (Zipf parameter - object popularity) = 0.8

25 Cacheable interior hit ratio observed at interior level fixing interior/leaf population ratio cacheable hit ratio increasing N I and N L -->

26 Interior hit ratio as percentage of all cacheable requests, fixing interior/leaf population ratio marginal cacheable hit ratio increasing N I and N L -->

27 Cacheable interior hit ratio fixing leaf population cacheable hit ratio increasing “bushiness” -->

28 Cacheable interior hit ratio as percentage of all requests fixing leaf population marginal cacheable hit ratio increasing “bushiness” -->

29 Cacheable interior hit ratio as percentage of all requests varying Zipf  parameter N L fixed at 1024 clients cacheable hit ratio

30 Cacheable interior hit ratio as percentage of all requests varying Zipf  parameter N I /N L fixed at 64K cacheable hit ratio

31 Conclusions (I) n Interior hit ratio captures effectiveness of upstream caches at reducing access traffic filtered by leaf/edge caches. u Hit ratios grow rapidly with covered population. Edge cache populations (N L ) are key: is it one thousand or one million?  With large N L, interior ratios are deceptive.  At N L = 10 5, interior hit ratios might be 90%, but the CDN sees less than 20% of the requests.

32 Correlating with NLANR Observations n Do the predictions match observations from existing large-scale caches? n Observations made from traces provided by NLANR (10/12/99). u Observed total hit ratio at (unified) root is 32% u 200 of the 914 leaf caches in the trace account for 95% of requests u daily request rate indicates population is on the order of tens of thousands u What is the predicted N?

33 Model vs. Reality n NLANR roots cooperate; we filter the traces to determine the unified root hit ratio. n NLANR caches are bounded; traces imply that capacity misses are low at 16GB. n Analysis assumes the population is balanced across the 200 leaves of consequence. n Analysis must compensate for objects determined to be uncacheable at a leaf.

34 Cacheable interior hit ratio varying percentage of requests detected as uncacheable by leaves 200+ leaf caches cacheable hit ratio

35 Cacheable interior hit ratio varying percentage of requests detected as uncacheable at request time 1000 clients per leaf cache cacheable hit ratio

36 Conclusions (II) n NLANR root effectiveness is around 32% today; it is serving its users well. n NLANR experiment could validate the model, but more data from the experiment is needed. u E.g., covered populations, leaf summaries n The model suggests that the population covered by NLANR is relatively small. With larger N and N L, higher root hit ratios are expected, with lower marginal benefit.

37

38 Modeling CDNs n If the routing function satisfies three properties:  an interior cache sees all requests for each assigned object x from a population of size N I u every interior cache sees an equivalent object popularity distribution (n/ held constant)  all requests are routed through leaf caches that serve N L clients n then interior cacheable hit ratio is:

39 Hit ratio with detected uncacheable documents n p u is the percentage of uncacheable requests detected at request time (and not forwarded to parents):

40 Cache Hierarchies n As introduced by the Harvest project u k levels of demand-side caches arranged in a tree (for now) u clients are bound to leaves u each node’s miss stream routes to its parent n As extended by NLANR (Squid) u NLANR-operated root caches cooperate by partitioning URL space

41 Servers Level 2 Level 1 (Root) Cache Hierarchies Illustrated


Download ppt "Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research."

Similar presentations


Ads by Google