Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Caching in HTTP Representation and Management of Data on the Internet.

Similar presentations


Presentation on theme: "1 Caching in HTTP Representation and Management of Data on the Internet."— Presentation transcript:

1 1 Caching in HTTP Representation and Management of Data on the Internet

2 2 Reasons for Using Web Caches Reduce Latency –Since the cache is closer to the client, it takes less time for the client to get the object and display it Save bandwidth –Since each object is only gotten from the server once, it reduces the amount of bandwidth used by a client

3 3 Type of Web Caches Browser Caches –A portion of the hard disk is used to store objects that have already been displayed –If an objected is requested again (for example, by hitting the “ back ” button), the request is served from the browser cache Proxy Caches –These are shared caches – they serve many users

4 4 For example, how much traffic is saved if it is not required to send the Google icon with each search result?

5 5 Caching Improves Performance in Two Ways In some cases, caching eliminates the need to send requests by using an expiration mechanism In other cases, caching eliminates the need to send full responses by using a validation mechanism

6 6 An Example of Using a Validation Mechanism client server cache Client: GET /fruit/apple.gif Server responds with Last-Modified-Date:... Server returns either 304 Not Modified or object Client sends GET /fruit/apple.gif … If-Modified-Since: … Client caches object and last-modified-date

7 7 Proxy Caches client server proxy server GET /fruit/apple.gif

8 8 Benefit of Caching client 10Mbps LAN RR 1.5Mbps server 15 req/sec 100Kbits/req proxy server 50% hit rate is possible, because the cache is shared by many users and, therefore, there is a large number of shared hits Internet

9 9 Points to Consider When Designing a Web Site Caches can help the Web site to load faster Caches may “ hide ” the users of the Web site, making it difficult to see who is using the site Caches may serve content that is out of date, or stale

10 10 The Risk in Caching Response might not be “ semantically transparent ” –the response is different from what would have been returned by the origin server The cache should verify that the copy is fresh The copy is stale if it is not fresh

11 11 Cases Where Objects are not Cached In the following cases, objects are not cached: –The object ’ s header tell the cache not to keep the object –The object has neither a validator nor a Last-Modified value –The object is authenticated or secured

12 12 Fresh Objects are Served from the Cache An object is fresh in the following cases: –The object has an expiry time or other age- controlling directive, and is still in the fresh period –The browser cache has already seen the object, and has been set to check once a session –A proxy cache has received the object recently, and the object was modified relatively long ago (this is a heuristic – see later)

13 13 Validating an Object If the object is stale (i.e., not fresh), the cache will ask the origin server to validate the object In response, the origin server will either –tell the cache that the object has not changed, or –send a new copy of the object to the cache

14 14 The Expires HTTP Header A response may include an Expires header: Expires: Fri, 30 Oct 2002 14:19:41 GMT If an expiry time is not specified, the cache can heuristically estimate the expiry time

15 15 A Possible Heuristic If the cache received the object 10 hours after it was last modified, then it can heuristically determine that the expiry time is 1 hour after it has received it In general, add 10% of the interval between the last-modification time (given by the Last-Modified header) and the time it was received

16 16 Cache-Control Headers (Introduced in HTTP 1.1) The following are possible cache-control headers in responses max-age=[seconds] –Specifies the maximum amount of time that an object will be considered fresh (similar to the Expires header) s-maxage=[seconds] –Similar to max-age, except that it only applies to proxy (shared) caches

17 17 More Cache-Control Headers must-revalidate –Tell caches that they must obey any freshness information provided with the object (HTTP allows caches to take liberties with the freshness of objects) proxy-revalidate –Similar to must-revalidate, except that it only applies to proxy (shared) caches

18 18 No-Cache Some cache-control header are meaningful in either responses or requests No-cache –In a response, it means not to cache the object –In a request, it means to bring a copy from the origin server (i.e., not to use a cache)

19 19 Who Adds Cache-Control Headers? The server –The configuration of the server determines which cache-control headers are added to responses –The author of the page can add headers by means of the.htaccess file (only in the Apache server) The author of the page –

20 20 Validators A validator is any mechanism that may help in determining whether a copy is fresh or stale –A strong validator is, for example, a counter that is incremented whenever the resource is changed –A weak validator is, for example, a counter that is incremented only when a significant change is made For example, if the only change in the site is the number of visitors …

21 21 Last-Modified Header The most common validator is the time that the document last changed, the last-modified time –It is given by the Last-Modified header –This header should be included in every response –It is a weak validator if an object can change more than once within a one- second interval

22 22 ETag (Entity Tag) ETag is a validator generated by the server (i.e., unique identifier) –It is part of the HTTP 1.1 specification (not available in HTTP 1.0) The preferred behavior for an HTTP 1.1 origin server is to send both a strong entity tag and a Last-Modified value

23 23 Conditional Requests Some conditional headers are –If-Modified-Since –If-Unmodified-Since –If-None-Match These headers are used to validate an object (i.e., check with the origin server whether the object has changed)

24 24 HTTP/1.1 304 Not Modified Date: Fri, 31 Dec 1999 23:59:59 GMT [blank line] If-Modified-Since Header The If-Modified-Since : header is used with a GET request If the requested resource has been modified since the given date, the server returns the resource as it normally would (i.e., header is ignored) Otherwise, the server returns a 304 Not Modified response, including the Date: header, but with no message body

25 25 If-Unmodified-Since Header The If-Unmodified-Since: header can be used with any method If the requested resource has not been modified since the given date, the server returns the resource as it normally would Otherwise, the server returns a 412 Precondition Failed response HTTP/1.1 412 Precondition Failed [blank line]

26 26 If-None-Match Header If the ETag matches when an If-None- Match header is specified, then the object is really the same and is not returned

27 27 Old Way not to Use the Cache The Pragma: no-cache request header indicates that the request should not be satisfied from a cache Same as the no-cache cash-directive Directive applies to any recipient along the request/response chain Don’t use pragma – only applies to requests and exists just for compatibility with HTTP 1.0

28 28 Cooperative Caching

29 29 Cooperative Caching (cont.) Higher level cache (e.g., national cash) –larger user population –higher hit rates Multiple Web cashes which cooperate  Improve overall performance Cooperative cashes usually built from clusters –divide the traffic overhead –improve storage capacity

30 30 Cooperative Caching (cont.) Which cashes should be asked for a particular doc? Hash routing (of URLs) – an object will not be present in more than one cash

31 31 Hop by Hop HTTP/1.1 introduces the concept of hop-by-hop headers: –Message headers that apply only to a given connection, and not to the entire path –It enables much more power with the usage of proxies (cashes) –The headers give information that is directed to the proxies on the way to the client

32 32 Chunked Encoding Music, video clips and other multimedia content is sent to the client by chunks of data Among other problems, are the difficulties that –One data chunk varies in size and composition from the next –The size of the chunks may not be specified in the headers and so it may be difficult to recognize the end of a chunk –There should be a way to deal with ‘ infinite ’ responses in order to deal with data chunks that are very big (or with infinite files that are created dynamically)

33 33 Compression Most image formats (GIF, JPEG, MPEG) are precompressed Many other data types used in the Web are not precompressed Compression could save almost 40% of the bytes sent via HTTP There is a need for negotiating the type of encoding of the compressed resource

34 34 Compression (cont.) Client sends the header Accept-Encoding –The header indicates the content-encodings that the client can handle and the ones that the client prefers Server Sends –Content-Encoding header – for end-to-end encoding indication –Transfer-Encoding header – for hop-to-hop encoding indication (supported only in HTTP/1.1)

35 35 Authentication Many sites require users to provide a username and password in order to access the documents housed on the server This requirement provides a mechanism for keeping track of users (more than just a security mechanism)

36 36 Authentication Client sends ordinary request message Server responds with –401 Authorization Required status code –WWW-Authenticate header which specifies how to perform authentication Client resends the requested message, but this time including the Authorization header (e.g., user- name & password) Client continues to add this header for each following request to that server

37 37 Authentication www.cs.huji.ac.il client server GET ~dbi/getGrade.jspAuthorization Required GET ~dbi/getGrade.jsp Authorization: user snoopy:passwordofsnoopy Response

38 38 Cookies Cookies are an alternative way to identify browsers (i.e., clients) Cookies are essentially small files that are saved in the file system of the client A cookie can store information on the client and thus helps in recognizing the client and getting required information about the client How can cookies solve the problem that HTTP is stateless?

39 39 Cookies Server response includes the Set-cookie header that has the attributes –name = VALUE –expires = DATE STRING –domain = DOMAIN NAME –path = PATH –secure Clients returns a cookie only to a server with matching URL (the server that put the cookie)

40 40 Cookies Example: –Client contacts a web site for the first time –Server response includes the header: Set-cookie : 1678453 –Client stores the cookie value and the server name in a special “ cookie file ” –For each further request for that server, the client will add the header Cookie : 1678453

41 41 Cookies (cont.) Usage: –Server requires authentication, but doesn ’ t want to hassle a user with a user-name and password –Remembering user ’ s preferences for advertising –Cookies enable creating a virtual shopping cart Problems –Users who access the same site from different machines –Privacy

42 42 Links For specifications and additional information: –http://www.w3.org/Protocols/http://www.w3.org/Protocols/ –http://www.w3.org/Protocols/Specs.htmlhttp://www.w3.org/Protocols/Specs.html –http://www.jmarshall.com/easy/http/http://www.jmarshall.com/easy/http/ –http://wdvl.com/Internet/Protocols/HTTP/article.ht mlhttp://wdvl.com/Internet/Protocols/HTTP/article.ht ml –Caching Tutorial for Web Authors and Webmasters (http://www.mnot.net/cache_docs/)


Download ppt "1 Caching in HTTP Representation and Management of Data on the Internet."

Similar presentations


Ads by Google