Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research.

Slides:



Advertisements
Similar presentations
View-Upload Decoupling: A Redesign of Multi-Channel P2P Video Systems Keith Ross Polytechnic Institute of NYU.
Advertisements

G. Alonso, D. Kossmann Systems Group
1 Efficient and Robust Streaming Provisioning in VPNs Z. Morley Mao David Johnson Oliver Spatscheck Kobus van der Merwe Jia Wang.
The Cache Location Problem. Overview TERCs Vs. Proxies Stability Cache location.
The Cache Location Problem IEEE/ACM Transactions on Networking, Vol. 8, No. 5, October 2000 P. Krishnan, Danny Raz, Member, IEEE, and Yuval Shavitt, Member,
Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang.
October 14, 2002MASCOTS Workload Characterization in Web Caching Hierarchies Guangwei Bai Carey Williamson Department of Computer Science University.
An Analysis of Internet Content Delivery Systems Stefan Saroiu, Krishna P. Gommadi, Richard J. Dunn, Steven D. Gribble, and Henry M. Levy Proceedings of.
Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.
Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS.
Web Caching Robert Grimm New York University. Before We Get Started  Interoperability testing  Type theory 101.
1 Probabilistic Models for Web Caching David Starobinski, David Tse UC Berkeley Conference and Workshop on Stochastic Networks Madison, Wisconsin, June.
Flash Crowds And Denial of Service Attacks: Characterization and Implications for CDNs and Web Sites Aaron Beach Cs395 network security.
Web Caching Robert Grimm New York University. Before We Get Started  Illustrating Results  Type Theory 101.
A Hybrid Caching Strategy for Streaming Media Files Jussara M. Almeida Derek L. Eager Mary K. Vernon University of Wisconsin-Madison University of Saskatchewan.
Caching And Prefetching For Web Content Distribution Presented By:- Harpreet Singh Sidong Zeng ECE Fall 2007.
Web Caching Schemes For The Internet – cont. By Jia Wang.
Web Caching and CDNs March 3, Content Distribution Motivation –Network path from server to client is slow/congested –Web server is overloaded Web.
The Medusa Proxy A Tool For Exploring User- Perceived Web Performance Mimika Koletsou and Geoffrey M. Voelker University of California, San Diego Proceeding.
1 ENHANCHING THE WEB’S INFRASTUCTURE: FROM CACHING TO REPLICATION ECE 7995 Presented By: Pooja Swami and Usha Parashetti.
Provisioning Content Distribution Networks for Streaming Media Jussara M. Almeida Derek L. Eager Michael Ferris Mary K. Vernon University of Wisconsin-Madison.
Tradeoffs in CDN Designs for Throughput Oriented Traffic Minlan Yu University of Southern California 1 Joint work with Wenjie Jiang, Haoyuan Li, and Ion.
CSCI-1680 Web Performance and Content Distribution Based partly on lecture notes by Scott Shenker and John Jannotti Rodrigo Fonseca.
Web Cache. Introduction what is web cache?  Introducing proxy servers at certain points in the network that serve in caching Web documents for faster.
Content Distribution Network (CDN) Performance Punit Shah CSE581 Internet Technologies OGI, OHSU 2002, Jan 16th.
Towards Understanding Modern Web Traffic
ACDN: A CDN for Applications Pradnya Karbhari Michael Rabinovich Zhen Xiao Fred Douglis AT&T Labs -- Research.
Large-Scale Web Caching and Content Delivery Jeff Chase CPS 212: Distributed Information Systems Fall 2000.
Web Caching and Content Delivery. Caching for a Better Web Performance is a major concern in the Web Proxy caching is the most widely used method to improve.
By Huang et al., SOSP 2013 An Analysis of Facebook Photo Caching Presented by Phuong Nguyen Some animations and figures are borrowed from the original.
1 Content Distribution Networks. 2 Replication Issues Request distribution: how to transparently distribute requests for content among replication servers.
On the Use and Performance of Content Distribution Networks Balachander Krishnamurthy Craig Wills Yin Zhang Presenter: Wei Zhang CSE Department of Lehigh.
Distributing Content Simplifies ISP Traffic Engineering Abhigyan Sharma* Arun Venkataramani* Ramesh Sitaraman*~ *University of Massachusetts Amherst ~Akamai.
By Ravi Shankar Dubasi Sivani Kavuri A Popularity-Based Prediction Model for Web Prefetching.
Achieving Load Balance and Effective Caching in Clustered Web Servers Richard B. Bunt Derek L. Eager Gregory M. Oster Carey L. Williamson Department of.
{ Content Distribution Networks ECE544 Dhananjay Makwana Principal Software Engineer, Semandex Networks 5/2/14ECE544.
ICN Considerations for ISP’s Existing Networks Lichun Li, Xin Xu, Jun Wang, Zhenwu Hao {xu.xin18, wang.jun17,
Infrastructure for Better Quality Internet Access & Web Publishing without Increasing Bandwidth Prof. Chi Chi Hung School of Computing, National University.
On the Scale and Performance of Cooperative Web Proxy Caching University of Washington Alec Wolman, Geoff Voelker, Nitin Sharma, Neal Cardwell, Anna Karlin,
Network Aware Resource Allocation in Distributed Clouds.
Web Cache Replacement Policies: Properties, Limitations and Implications Fabrício Benevenuto, Fernando Duarte, Virgílio Almeida, Jussara Almeida Computer.
Segment-Based Proxy Caching of Multimedia Streams Authors: Kun-Lung Wu, Philip S. Yu, and Joel L. Wolf IBM T.J. Watson Research Center Proceedings of The.
Huirong Fu and Edward W. Knightly Rice Networks Group Aggregation and Scalable QoS: A Performance Study.
Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
Microsoft Research1 Characterizing Alert and Browse Services for Mobile Clients Atul Adya, Victor Bahl, Lili Qiu Microsoft Research USENIX Annual Technical.
Adaptive Web Caching CS411 Dynamic Web-Based Systems Flying Pig Fei Teng/Long Zhao/Pallavi Shinde Computer Science Department.
An IP Address Based Caching Scheme for Peer-to-Peer Networks Ronaldo Alves Ferreira Joint work with Ananth Grama and Suresh Jagannathan Department of Computer.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Computer Science Lecture 14, page 1 CS677: Distributed OS Last Class: Concurrency Control Concurrency control –Two phase locks –Time stamps Intro to Replication.
On The Cooperation of Web Clients and Proxy Caches Yiu Fai Sit, Francis C.M. Lau, Cho-Li Wang Department of Computer Science The University of Hong Kong.
Practical LFU implementation for Web Caching George KarakostasTelcordia Dimitrios N. Serpanos University of Patras.
HTTP evolution - TCP/IP issues Lecture 4 CM David De Roure
The LSAM Proxy Cache - a Multicast Distributed Virtual Cache Joe Touch USC / Information Sciences Institute 元智大學 資訊工程研究所 系統實驗室 陳桂慧
Hot Systems, Volkmar Uhlig
The Measured Access Characteristics of World-Wide-Web Client Proxy Caches Bradley M. Duska, David Marwood, and Michael J. Feeley Department of Computer.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
An Analysis of Internet Content Delivery Systems 19 rd November, 2007 Youngsub CSE, SNU.
Overview on Web Caching COSC 513 Class Presentation Instructor: Prof. M. Anvari Student name: Wei Wei ID:
On the scale and performance of cooperative Web proxy caching 2/3/06.
Improving the WWW: Caching or Multicast? Pablo RodriguezErnst W. BiersackKeith W. Ross Institut EURECOM 2229, route des Cretes. BP , Sophia Antipolis.
1 Evaluation of Cooperative Web Caching with Web Polygraph Ping Du and Jaspal Subhlok Department of Computer Science University of Houston presented at.
John S. Otto Mario A. Sánchez John P. Rula Fabián E. Bustamante Northwestern, EECS.
The Impact of Replacement Granularity on Video Caching
On the Scale and Performance of Cooperative Web Proxy Caching
Edge computing (1) Content Distribution Networks
Distributed Systems CS
Group Based Management of Distributed File Caches
EE 122: Lecture 22 (Overlay Networks)
Presentation transcript:

Web Caching and Content Distribution: A View From the Interior Syam Gadde Jeff Chase Duke University Michael Rabinovich AT&T Labs - Research

Overview n Analytical tools have evolved to predict behavior of large-scale Web caches. u Are results from existing large-scale caches consistent with the predictions? F NLANR u What do the models predict for Content Distribution/Delivery Networks (CDNs)? n Goal: answer these questions by extending models to predict interior cache behavior.

Generalized Cache/CDN (External View) {request, reply} Origin Servers Clients {push, request, reply} CDNs Web Caches

Generalized Cache/CDN (Internal View) Leaf Caches Interior Caches root caches reverse proxies Request Routing Function ƒ bound client populations ƒ

Goals and Limitations n Focus on interior cache behavior. u Assume leaf caches are ubiquitous. u Model CDNs as interior caches. n Focus on hit ratio (percentage of accesses absorbed by the “cloud”). u Ignore push replication; at best it merely reduces some latencies by moving data earlier. n Focus on “typical” static Web objects. u Ignore streaming media and dynamic content.

Outline n Analytical model u applied to interior nodes of cache hierarchies u applied to CDNs n Implications of the model for CDNs in the presence of ubiquitous leaf caching n Match model with observations from the NLANR cache hierarchy n Conclusion

Analytical Model n [Wolman/Voelker/Levy et. al., SOSP 1999] u refines [Breslau/Cao et. al., 1999], and others n Approximates asymptotic cache behavior assuming Zipf-like object popularity u caches have sufficient capacity n Parameters: u = per-client request rate u  = rate of object change u p c = percentage of objects that are cacheable u  = Zipf parameter (object popularity)

Cacheable Hit Ratio: the Formula n C N is the hit ratio for cacheable objects achievable by population of size N with a universe of n objects. [Wolman/Voelker/Levy et. al., SOSP 99] N

Inside the Hit Ratio Formula Approximates a sum over a universe of n objects......of the probability of access to each object x... …times the probability x was accessed since its last change. C is just a normalizing constant for the Zipf-like popularity distribution (a PDF). C = 1/  in [Breslau/Cao 99] 0 <  < 1 N

Level 2 Level 1 (Root) N 2 clients N 1 clients An Idealized Hierarchy Assume the trees are symmetric to simplify the math. Ignore individual caches and solve for each level.

Hit Ratio at Interior Level i n C N gives us the hit ratio for a complete subtree covering population N n The hit ratio predicted at level i or at any cache in level i is given by: “the hits for N i (at level i) minus the hits captured by level i+1, over the miss stream from level i+1”

Root Hit Ratio n Predicted hit ratio for cacheable objects, observed at root of a two-level cache hierarchy (i.e. where r 2 =Rp c ):

N L clients N clients Generalizing to CDNs Request Routing Function Interior Caches (supply side) N I clients ƒ(leaf, object, state) Leaf Caches (demand side) N L clients Symmetry assumption: ƒ is stable and “balanced”. ƒ

CDN1CDN2 Servers Leaf Caches Interior Caches

Servers Leaf Caches Interior Caches N I clients

Servers Leaf Caches What happens to C N if we partition the object universe?

Servers Leaf Caches

Servers Leaf Caches

Servers Leaf Caches

Servers Leaf Caches

CDN1CDN2 Servers Leaf Caches

Hit ratio in CDN caches n Given the symmetry and balance assumptions, the cacheable hit ratio at the interior (CDN) nodes is: N I is the covered population at each CDN cache. N L is the population at each leaf cache.

Analysis n We apply the model to gain insight into interior cache behavior with:  varying leaf cache populations ( N L ) F e.g., bigger leaf caches  varying ratio of interior to leaf cache populations ( N I /N L ) F e.g., more specialized interior caches u Zipf  parameter changes F e.g., more concentrated popularity

Analysis (cont’d) n Fixed parameters (unless noted otherwise): u (client request rate) = 590 reqs./day u  (rate of object change) = F once every 14 days (popular objects, 0.3%) F once every 186 days (unpopular objects) u p c (percent of requests cacheable) = 60% u  (Zipf parameter - object popularity) = 0.8

Cacheable interior hit ratio observed at interior level fixing interior/leaf population ratio cacheable hit ratio increasing N I and N L -->

Interior hit ratio as percentage of all cacheable requests, fixing interior/leaf population ratio marginal cacheable hit ratio increasing N I and N L -->

Cacheable interior hit ratio fixing leaf population cacheable hit ratio increasing “bushiness” -->

Cacheable interior hit ratio as percentage of all requests fixing leaf population marginal cacheable hit ratio increasing “bushiness” -->

Cacheable interior hit ratio as percentage of all requests varying Zipf  parameter N L fixed at 1024 clients cacheable hit ratio

Cacheable interior hit ratio as percentage of all requests varying Zipf  parameter N I /N L fixed at 64K cacheable hit ratio

Conclusions (I) n Interior hit ratio captures effectiveness of upstream caches at reducing access traffic filtered by leaf/edge caches. u Hit ratios grow rapidly with covered population. Edge cache populations (N L ) are key: is it one thousand or one million?  With large N L, interior ratios are deceptive.  At N L = 10 5, interior hit ratios might be 90%, but the CDN sees less than 20% of the requests.

Correlating with NLANR Observations n Do the predictions match observations from existing large-scale caches? n Observations made from traces provided by NLANR (10/12/99). u Observed total hit ratio at (unified) root is 32% u 200 of the 914 leaf caches in the trace account for 95% of requests u daily request rate indicates population is on the order of tens of thousands u What is the predicted N?

Model vs. Reality n NLANR roots cooperate; we filter the traces to determine the unified root hit ratio. n NLANR caches are bounded; traces imply that capacity misses are low at 16GB. n Analysis assumes the population is balanced across the 200 leaves of consequence. n Analysis must compensate for objects determined to be uncacheable at a leaf.

Cacheable interior hit ratio varying percentage of requests detected as uncacheable by leaves 200+ leaf caches cacheable hit ratio

Cacheable interior hit ratio varying percentage of requests detected as uncacheable at request time 1000 clients per leaf cache cacheable hit ratio

Conclusions (II) n NLANR root effectiveness is around 32% today; it is serving its users well. n NLANR experiment could validate the model, but more data from the experiment is needed. u E.g., covered populations, leaf summaries n The model suggests that the population covered by NLANR is relatively small. With larger N and N L, higher root hit ratios are expected, with lower marginal benefit.

Modeling CDNs n If the routing function satisfies three properties:  an interior cache sees all requests for each assigned object x from a population of size N I u every interior cache sees an equivalent object popularity distribution (n/ held constant)  all requests are routed through leaf caches that serve N L clients n then interior cacheable hit ratio is:

Hit ratio with detected uncacheable documents n p u is the percentage of uncacheable requests detected at request time (and not forwarded to parents):

Cache Hierarchies n As introduced by the Harvest project u k levels of demand-side caches arranged in a tree (for now) u clients are bound to leaves u each node’s miss stream routes to its parent n As extended by NLANR (Squid) u NLANR-operated root caches cooperate by partitioning URL space

Servers Level 2 Level 1 (Root) Cache Hierarchies Illustrated