1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP.
Published byModified over 4 years ago
Presentation on theme: "1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP."— Presentation transcript:
1 Spring Semester 2007, Dept. of Computer Science, Technion Internet Networking recitation #13 Web Caching Protocols ICP, CARP
2 ICP - Internet Caching Protocol ICP is Web caching protocol ICP version 2 defined in RFC 2186 Message format used for communicating among Web caches Used to exchange hints about the existence of URLs in neighbor caches. Caches exchange ICP queries and replies gather information to use in selecting the most appropriate location from which to retrieve an object
3 ICPv2 Protocol specification Generally, Web caches use HTTP for the transfer of object data However, caches can benefit from a simpler, lighter communication protocol. ICP is primarily used in a cache mesh to locate specific Web objects in neighboring caches. One cache sends an ICP query to its neighbors. The neighbors send back ICP replies indicating a "HIT" or a "MISS."
4 ICP Implementation In current practice, ICP is implemented on top of UDP There is no requirement that it be limited to UDP. ICP over UDP offers features important to Web caching applications. Query/reply exchange needs to occur quickly. A cache cannot wait longer than that before beginning to retrieve an object. Failure to receive a reply message means the network path is either congested or broken. In either case we would not want to select that neighbor.
5 Internet Networking Cache selection ICP messages can also be used for cache selection Failure to receive a reply from a cache network or system failure. The ICP reply may include extra information Can assist selection of the most appropriate source from which to retrieve an object.
6 Internet Networking ICPv2 application specification RFC 2187 RFC 2187RFC 2187 A single Web cache will reduce the amount of traffic generated by the clients behind it Similarly, a group of Web caches can benefit by sharing another cache in much the same way In a cache hierarchy (or mesh) one cache establishes peering relationships with its neighbor caches
7 Internet Networking Web Cache Hierarchies Two types of cache relationship: Parent A parent cache is essentially one level up in a cache hierarchy Sibling A sibling cache is on the same level Neighbor (peer) Is either parent or sibling which is a single “cache-hop” away
8 Internet Networking A Simple Web Cache Hierarchy Internet Parent Cache Local Cache Sibling Cache Cache Clients Hits Resolved Hits and Misses Resolved Direct Retrievals
9 Internet Networking Levels The general flow of document requests is up the hierarchy When a cache does not hold a requested object It may ask via ICP whether any of its neighbor caches has the object. If there is a ‘Hit’ then the cache will request it from them. Else the cache must forward the request either to a parent, or directly to the origin server.
10 Internet Networking Parent and Sibling Caches “Neighbor hit" may be fetched from either parent or sibling cache “Neighbor miss" may NOT be fetched from a sibling. In other words: sibling relationship - can retrieve objects the sibling already has cached. parent relationship - can retrieve any object regardless of whether or not it is cached.
11 Internet Networking ICP Additional Delay Caches are designed to return ICP requests quickly. The application does minimal processing of the ICP request Most ICP-related delay is due to transmission on the network. ICP serves to provide an indication of neighbor reachability. If ICP replies from a neighbor fail to arrive, it should not be used at this time Network path is congested (or down) Cache application is not running on the ICP-queried neighbor machine ICP provides also some form of load balancing, because an idle cache can reply faster than a busy one.
12 Internet Networking Determine whether to use ICP Not every HTTP request requires an ICP query to be sent Obviously, cache hits will not need ICP because the request is satisfied immediately For origin servers very close to the cache, we do not want to use any neighbor caches Some classes of requests the cache (or the administrator) may prefer to forward directly to the origin server all non-GET request methods URLs containing certain strings (e.g. “cgi_bin”)
13 Internet Networking Source Selection The cache sends queries to each peer. In order to maximize the chance to get a HIT reply from one of the peers, the cache waits for all ICP replies to be received (query timeout is applied). HIT reply - object retrieval commences immediately from the replying peer. When all peers MISS either parent cache or the origin server is selected.
14 Internet Networking Multicast for Efficient Distribution A cache may deliver ICP queries to a multicast address. Neighbor caches may join the multicast group to receive such queries. But for multicast we have no way to know exactly how many replies to expect ICP replies sent to unicast address: Multicasting ICP replies would not reduce the number of packets sent. It prevents other group members from receiving unexpected replies. The reply should follow unicast routing path to indicate connectivity between the receiver and the sender since the subsequent HTTP request will be unicast routed.
15 Internet Networking Differences Between ICP and HTTP HTTP supports a rich and sophisticated set of features. ICP was designed to be simple, small, and efficient. HTTP request and reply headers consist of lines of ASCII text. ICP uses a fixed size header and represents numbers in binary.
16 Internet Networking CARP - Cache Array Routing Protocol Microsoft® Proxy Server 2.0 uses the Cache Array Routing Protocol (CARP) Series of algorithms that are applied on top of HTTP Multiple proxy servers are arrayed as a single logical cache Does not require a new wire protocol Uses HTTP, compatible with existing firewalls and proxy servers
17 Internet Networking Hash-based Routing Provides a deterministic "request resolution path" through an array of proxies The request resolution path Hashing of proxy array member identities and URLs For any given URL request, the proxy server will know exactly where in the proxy array the information will be stored (or still not)
18 Internet Networking Benefits Deterministic request resolution path: No query messaging between proxy servers that existed in ICP Eliminates the duplication of contents that otherwise occurs on an array of proxy servers Has positive scalability, becomes faster and more efficient as more proxy servers are added
19 Internet Networking How CARP works A hash function is computed for the name of each proxy server A hash function is computed for the name of each requested URL The hash value of the URL is combined with the hash value for each proxy Whichever URL+Proxy Server hash comes up with the highest value, becomes "owner" of the information cache If a server fails, its URLs are automatically rerouted to the server with the next highest score
20 Internet Networking How CARP works (cont.) The result: Deterministic location for all cached information Web browser or downstream proxy server can know exactly where a requested URL either already is stored locally, or will be located after caching Because the hash functions used to assign values are so great: 2^32 = 4294967296 - the result is a statistically distributed load balancing across the array
21 Internet Networking Updating Membership List Array manager maintains a current list of members of a particular proxy array All proxies servers in the array stores their own local copies of the array list and periodically send requests for updates to the array manager They also watches all HTTP requests to any array members and if a request fails, then marks that proxy member down until next update from the array manager