Analysis of Web Caching Architectures: Hierarchical and Distributed Caching Pablo Rodriguez, Christian Spanner, and Ernst W. Biersack IEEE/ACM TRANSACTIONS ON NETWORKING VOL. 9, NO. 4, Auguest 2001
Abstract Caching architectures Hierarchical Distributed Hybrid Analytical models Performance Connection time Transmission time Total latency Bandwidth Cache load
Caching architectures Hierarchical caching Institutional cache Intermediate cache National cache Distributed caching Institutional cache
Network topology
The model Network model Full O-ary tree Document model Request – Poisson distribution Popularity - Zipf distribution Hierarchical caching Caches are placed at the access points between two different networks. Distributed caching Caches are placed at the institutional network.
Network model
Document model
Properties and limitations of the model O-ary trees are good models. Modifying the height or the number of tiers of the tree can easily model other networks. The model assumes homogeneous client communities. Heterogeneous client communities can be easily modeled. Simulations results in this paper should be considered as relative results.
Connection time Depend on the number of network links from the client to the cache.
Connection time (cont’d) Distance of transmission A request first travels up then down TCP three-way handshake Server
Transmission time Caches operate in a cut-through mode. Request rate
Comparison O = 4 H = 3 z = 10 N = 250 million
Connection time
Network traffic at every tree level
Expected transmission time (a) Non-congested national network (b) Congested national network
Total latency
Heterogeneous client communities (a) Expected connection time (b) Expected transmission time
Bandwidth usage The expected number of links traversed to distribute one packet to the clients. (a) Regional network (b) National network
Cache load The filtered request rate
Disk space The average Web document size S times the average number of copies present in the caching infrastructure. The average number of copies present in the caching infrastructure can be calculated using the probability that a new document copy is created at every cache level.
Disk space (cont’d)
A hybrid caching scheme A certain number of caches k cooperate at every network level. When a document cannot be found in a cache The cache checks if the document resides in any of the cooperating caches. If multiple caches have a document copy, the neighbor cache with the lowest latency is selected. Otherwise, the request is then forwarded to the immediate parent cache or to the server.
Connection time
Connection time (cont’d)
Transmission time
Transmission time (cont’d)
Total latency
Bandwidth usage (a) National network (b) Regional network
Cache load
Conclusions Hierarchical caching architecture Reduce the expected distance to hit a document Decrease the bandwidth usage Reduce the administrative concerns Need powerful intermediate caches or load- balancing algorithms Distributed caching architecture Large network distances High bandwidth usages Administrative issues Hybrid scheme is the best