Understanding the Performance of Web Caching System with an Analysis Model and Simulation Xiaosong Hu Nur Zincir-Heywood Sep
Outline Web Cache Background Cooperating Web Caching System The model of the Hierarchical System Performance Analysis Simulation Comparison Conclusion
Web Cache Background WWW becomes the dominant application of the Internet Demand for bandwidth outstrips the supply Congestion and server overloading Web Caching---Storing some popular pages somewhere close to the clients Browser caching and proxy caching
Cooperating Web Caching System Can we configure the proxy cache server to visit each other at first before going to the source server? Based on research so far, the web caching system can be divided into --- Hierarchical System --- Distributed System --- Hybrid System
Hierarchical System Request Client The router without associated cache server The router with associated cache server ** The figure shows the network system with O = 2, h = 2. Institutional cache Regional cache National cache Response To Source Server
Performance Analysis ---Parameters--- Hit Ratio: The probability that a requested document can be retrieved from the system Latency: The average time to retrieve a document from the Internet Traffic: The average traffic at each link over a time unit Average Hop: The average distance for a request to be satisfied
Performance Analysis ---Assuming--- Performance Analysis ---Assuming--- Parameter name Parameter value 1Nodal outdegree of the tree (O)3 2Hops between two neighboring levels of cache (h)2 3 The hops from the national cache to the source server (z) 10 [10] 4Average request rate from the clients ( β I )2 requests/sec 5Total document number1 million 6Average document update time (∆)12h 7Skew factor of Zipf distribution (α)0.64 [9] 8Cache Content updating functionLFU
Performance Analysis ---Formula for Hit Ratio--- the most popular C i documents will be cached. A request is a hit only if it is for a document in C i and its interval is less than ∆. Hit Ratio H = ---P N (i) is the probability that a request is for document i ---P(L) is the probability that the request for document i within ∆.
Performance Analysis --- P N (i) & P(L) --- The probability of a request for document i is a Zipf-like distribution P N (i) =, The arriving requests are a Poisson distribution. Within a short time Ω =
Performance Analysis --- average hit ratio --- Institutional cache (i) H(i)=H(ii)+H(ir)+H(in) Regional cache (r) H(r)=H(rr)+H(rn) National cache (n) H(n) =H(nn ) Client H average = [H(i)O 2h + H(r)O h + H(n)]/(O 2h +O h +1)
Simulation The web cache simulator Data Sets: ---half, one and two millions requests ---login file; Input & Output Component Simulation Component Network Cache System
Comparison
Conclusion The hit ratio is a logarithmical function or a function with a small power to the cache size The results from the model are compatible with those from the simulation The model can be used to further analyze the cache system mechanism