Web Caching Schemes1 A Survey of Web Caching Schemes for the Internet Jia Wang
Web Caching Schemes2 Agenda The World Wide Web Problem and solution (caching) Proxy servers Advantages of web caching Disadvantages of web caching Elements of A WWW caching system Desirable properties of WWW caching system Problems in designing caching systems for the WWW Caching architecture
Web Caching Schemes3 The World Wide Web The WWW can be considered as a large distributed information system. Exponential growth in size. On may 1999 included 600 millions of static web pages. Increases 15% per month. Very popular.
Web Caching Schemes4 SIZE OF DISTINCT STATIC WEB PAGES
Web Caching Schemes5 The World Wide Web Usage is relatively inexpensive Accessing information is very fast Documents appeal to a wide range of interests But …..
Web Caching Schemes6 The World Wide Web Network congestion Server overloading
Web Caching Schemes7 Problem Internet backbone capacity increases 60% per year. Bandwidth is not growing fast enough. Without solution WWW will become too congested and its entire appeal would be lost.
Web Caching Schemes8 Solution Caching: Placing popular objects at locations close to the clients.
Web Caching Schemes9 proxy servers HTTP servers handled by companies for security reasons. The bottleneck of the connection between the client and the internet. Shared by all clients inside the firewall.
Web Caching Schemes11 proxy servers Belonging to same organization, clients share common interests. They probably access the same set of documents.
Web Caching Schemes12 thus On the proxy server, a previously requested and cached documents would likely result in future hits.
Web Caching Schemes13 proxy severs Caching most popular web pages on the proxy server can: Save network bandwidth Lower access latency for the client
Web Caching Schemes14 Advantages of web caching Reduces bandwidth consumption Decreases network traffic Lessens network congestion Access latency: frequently used docs are cached nearby less traffic shorter delay for docs not cached
Web Caching Schemes15 Advantages of web caching (cont.) Reduces workload of remote server Data can be accessed when remote server is down (enhanced robustness). Allows analysis of organization usage patterns cooperation between caches increases efficiency.
Web Caching Schemes16 Disadvantages of web caching Data not updated automatically Cache miss can cause increase in latency (extra proxy processing). Bottleneck effect – limit # of clients per proxy. A single proxy is a single point of failure Information providers can not monitor # of visits per site.
Web Caching Schemes17 Elements of A WWW caching system Documents can be cached at the clients, the proxies and the servers.
Web Caching Schemes18 Elements of a WWW caching system
Web Caching Schemes19 Desirable properties of WWW caching system fast access robustness transparency scalability efficiency adaptivity stability load balance ability to deal with heterogeneity simplicity
Web Caching Schemes20 Fast access Reduce web access latency to a minimum. Especially comparing to other servers not using caching techniques.
Web Caching Schemes21 Robustness Robustness = Availability to user eliminate single point failure in case of failure – fall down gracefully easy to recover from failure
Web Caching Schemes22 Transparency Transparent to the user The user should only notice: Faster response Higher availability
Web Caching Schemes23 Scalability Scale well along the increasing size and density of the network. All protocols should be as lightweight as possible.
Web Caching Schemes24 Efficiency impose minimal additional burden on the network (in control & data packets) do not adopt any scheme which leads to under-utilization of the network
Web Caching Schemes25 Adaptivity adapt to dynamic changing in the user demand and network environment achieve optimal performance
Web Caching Schemes26 Stability Do not introduce instabilities into the network
Web Caching Schemes27 Load balancing distribute load evenly through the entire network no bottlenecks / hot-spots
Web Caching Schemes28 Ability to deal with heterogeneity Adapt to a range of network architecture (hardware & software)
Web Caching Schemes29 Simplicity Mechanism simple to deploy simpler schemes are easier to implement and likely to be accepted as international standards
Web Caching Schemes30 What Problems do we face in designing caching systems for the WWW ???
Web Caching Schemes31 Problems in designing caching systems for the WWW Caching system architecture how cache proxies are organized – hierarchically, distributed or hybrid.
Web Caching Schemes32 Problems in designing caching systems for the WWW Proxy placement were to place a cache proxy in order to optimize performance
Web Caching Schemes33 Problems in designing caching systems for the WWW Caching contents What can be cached in the caching system
Web Caching Schemes34 Problems in designing caching systems for the WWW Proxy cooperation How do proxies cooperate with each other
Web Caching Schemes35 Problems in designing caching systems for the WWW Data sharing what kind of data/information can be shared among among cooperative proxies
Web Caching Schemes36 Problems in designing caching systems for the WWW Cache resolution/routing how does a proxy decide where to fetch a page requested by a client.
Web Caching Schemes37 Problems in designing caching systems for the WWW Prefetching How does a proxy decide what and when to prefetch from webservers or other proxies to reduce access latency.
Web Caching Schemes38 Problems in designing caching systems for the WWW Cache placement/ replacement how the proxy decides which page to be stored in its cache and which page to be removed from it.
Web Caching Schemes39 Problems in designing caching systems for the WWW Cache coherency how does a proxy maintain data consistency
Web Caching Schemes40 Problems in designing caching systems for the WWW Control information distribution how is the control information (e.g URL) distributed among proxies.
Web Caching Schemes41 Problems in designing caching systems for the WWW Dynamic data caching how to deal with data that is not cachable
Web Caching Schemes42 Caching architecture Hierarchical Caches are placed at multiple levels of the network. national regional institutional bottom
Web Caching Schemes43 Hierarchical architecture Bottom – clients/browsers caches. national regional institutional bottom web page not found
Web Caching Schemes44 Hierarchical architecture after web page is found national regional institutional bottom forward page, leave copy
Web Caching Schemes45 Hierarchical architecture Advantages: Bandwidth efficient – especially when cache servers are slow. Allows to efficiently diffuse popular web pages towards the demand.
Web Caching Schemes46 Hierarchical architecture Disadvantages Cache server needs to be placed at key access points of the network requires coordination among caches. Each level adds a delay. High levels are bottlenecks. multiple copies at different cache levels.
Web Caching Schemes47 Distributed architecture Caches at the bottom level only. No other intermediate caching levels. Each cache server contains meta-data on the data stored on other servers. Hierarchy used only for distributing information about location of the copy. No copying of actual documents.
Web Caching Schemes48 Advantages: Traffic flows through low network levels which are less congested. No additional disk space required for intermediate network levels. Better load sharing. More fault tolerant. Distributed architecture
Web Caching Schemes49 Disadvantages: High connection times Higher bandwidth usage Administrative issues. Distributed architecture
Web Caching Schemes50 Examples ICP – Internet Cache Protocol (Harvest group) Retrieve data from neighboring caches + parent caches CARP – Cache Array Routing Protocol URL space divided to an array of caches. Each cache stores only documents whose URL are hashed to it. Distributed architecture
Web Caching Schemes51 Hybrid architecture Caches may cooperate with other caches at the same level or at a higher level using distributed caching. ICP is an example: the document is fetched from a parent/neighbor cache that has the lowest RTT.
Web Caching Schemes52 Performance of architectures Hierarchical caching has shorter connection times than distributed caching. Additional copies at intermediate level reduces retrieval latency for small documents. Distributed caching has shorter transmission times & higher bandwidth usage. “ Well configured ” hybrid scheme can reduce both connection time and transmission time.