Web Caching
Why Caching? Faster browsing experience for users Cache hit rate Traffic Prioritization Reduce network bandwidth requirements significantly Live media stream splitting Control “who goes where” and “who does what” and “when they can do it” Audit Employee Use of Corporate Assets Increase Performance, Increase Security, Improve Productivity, and Reduce Costs!
Take A Look At This Page….
Web Caching GET Internet Lisa’s Desktop Lisa’s Desktop Server Jeff’s Desktop Jeff’s Desktop GET Cache
The Cache Dilemma Hit Rate Freshness ?
Why Hit Rate is Important Better cache hit-rate means: Higher effective bandwidth Lower avg. latency Improve hit-rate with: Locality of access More users Latency
What: Content and Protocols HTTP 1.0 Basic protocol Send Request based on fix number of verbs GET HEAD POST Receive response, meta-data, content
What: Content and Protocols Example: GET /pub/www/index.html HTTP/1.0 Response: HTTP/ OK Server: Microsoft-IIS/5.0 Date: Sat, 19 Oct :46:53 GMT Expires: Sun, 20 Oct :00:00 GMT Content-Length: 2291 Content-Type: text/html Cache-control: private
What: Content and Protocols Example “if-modified-since”: GET /pub/www/index.html HTTP/1.0 If-Modified-Since: Sat, 19 Oct :43:31 GMT Response: HTTP/ OK Server: Microsoft-IIS/5.0 Date: Thu, 13 Jul :46:53 GMT Expires: Sun, 20 Oct :00:00 GMT Content-Length: 2291 Content-Type: text/html Cache-control: private
What: Content and Protocols Example “if-modified-since”: GET /pub/www/index.html HTTP/1.0 If-Modified-Since: Sat, 19 Oct :43:31 GMT Response: HTTP/ Not Modified
Basic caching algorithm Pages may be Fresh: up-to-date Expired: current date > expiration date Stale: “old”
Basic caching algorithm - #2 If (page is in the cache) if ( page is expired or stale ) Get from server - if-modified-since If not modified, Get from cache else Get from Server else Get from cache Else Get from Server
Basic caching algorithm - #3 If cache has space Store the file Else Delete expired from cache Delete stale from cache Delete LRU from cache Delete largest/smallest from cache? Store the file
Proxy Details GET / HTTP/1.1 Host: localhost:1235 User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: ) Accept: image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Connection: keep-alive Without Proxy
Proxy Details GET HTTP/1.1 Host: star.cs.byu.edu User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US; rv: ) Accept: text/xml,application/xml,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO ,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive With Proxy
Zipf’s Law In a corpus of natural language utterances, the frequency of any word is roughly inversely proportional to its rank in the frequency table. The most frequent word will occur approximately twice as often as the second most frequent word, etc. Example: In the Brown Corpus, “the” is the most frequently occurring word and accounts for nearly 7% of all word occurrences. (69971 of slightly over 1 million) 2nd place “of” - slightly over 3.5% (36411) 3rd place “and” (28852) Only 135 words are needed to account for half of the Brown Corpus
Zipf’s law Zipf’s law: The frequency of an event P as a function of rank i is a power law function: P i = Ω / i α where α ≤ 1
Zipf’s law Observed to be true for Frequency of written words in English texts Population of cities Income of a company as a function of rank
Zipf’s law and web access For a given server, page access by rank follows Zipf’s law Web requests from a fixed population of users follows Zipf’s law 0.64 < α < 0.83
Observations Top 1% of all documents account for 20% - 35% of proxy requests Top 10% account for 45% - 55% of requests It takes 25% to 40% of all documents to account for 70% of requests It takes 70% to 80% of all documents to account for 90% of requests
Observations
For an infinite sized cache, the hit-ratio for a web-proxy grows in a log-like fashion as a function of the client population of the proxy and the number of requests seen by the proxy.
Local URL Resolution Protocol Peer-to-Peer web-cache Bootstrapping & Peer Discovery UDP broadcast Content Location UDP broadcast for content Content Delivery Direct Download from single peer