Presentation is loading. Please wait.

Presentation is loading. Please wait.

A tutorial on Web Caching

Similar presentations


Presentation on theme: "A tutorial on Web Caching"— Presentation transcript:

1 A tutorial on Web Caching
HTTP CACHING A tutorial on Web Caching

2 Goals Understanding of HTTP Caching HTTP headers related to Caching
Common terms Web environment without and with caching Benefits of caching Types of cache Cache processing mechanism HTTP headers related to Caching Common HTTP Caching Scenarios Queries

3 Understanding of HTTP Caching

4 Prerequisites Before this tutorial, make sure you have completed following tutorials: HTTP Tutorial HTTP Headers Tutorial HTTP Cookies Tutorial

5 Common Terms

6 Cache (Dictionary Meaning): A safe place for hiding & storing things.
Cache (Computer): A special high speed buffering technique employed in various places to provide high speed access like Disk Cache, File Cache etc. Web cache: A web cache stores copies of web documents passing through it; subsequent requests may be satisfied from the cache if certain conditions are met.

7 Origin Servers: This server owns resources and can send a response for these resources straight back to the client. It is the last server to receive the request, and originates the reply to the client. Proxy Server: is a server (a computer system or an application program) that acts as an intermediary for requests from clients seeking resources from other servers. Gateway: Gateways will translate the identifier for the requested resource into another form and will request this translated resource from another, often non-HTTP, server.

8 Cache Hit: When a request arrives at the cache it can serve the local copy.
Cache Miss: When a request arrives at the cache it does not have the response available in the cache and has to get it from the server. Revalidation: From time to time the cache needs to check whether the cached documents are up-to-date with the server or not. This is known as the Freshness Check or Revalidation.

9 Web Environment Without Cache

10 Local ISP/Proxy Web Servers & App Servers Data Center ISP/Proxy Intranet Browser Browser Browser

11 Problems

12 Redundant Data Transfers
When multiple clients access a popular origin server page, the server transmits the same document multiple times, once to each client The same bytes travel across the network over and over again These redundant data transfers eat up expensive network bandwidth, slow down transfers, and overload web servers.

13 Bandwidth Bottlenecks
Clients access servers at the speed of the slowest network on the way Flash Crowds Many people accessing a web document at nearly the same time

14 Distance Delays Every network router adds delays to Internet traffic.
Even if there are not many routers between client and server, the speed of light alone can cause a significant delay. The direct distance from Boston to San Francisco is about 2,700 miles. In the very best case, at the speed of light (186,000 miles/sec), a signal could travel from Boston to San Francisco in about 15 milliseconds and complete a round trip in 30 milliseconds

15 Web Caching Environment

16 Local ISP/Proxy Web Servers & App Servers cache Data Center ISP/Proxy
Intranet cache cache cache Browser cache Browser cache Browser

17 Benefits of Web Caching

18 Reduces latency Reduces network traffic Reduces bandwidth usage Reduce server load Reduce distance delays

19 Types of Web Cache

20 Private Cache: These are exclusive.
Browser Cache Public Cache: These are shared between many. Proxy Cache Gateway cache

21 Private Cache Browser Cache : IE, Mozilla and other browsers have a cache setting that allows the user to set cache policies and disk space allotted for cache. The browser has a folder in which certain items that have been downloaded are stored for future access. Makes faster loading of the pages. The cache could be emptied periodically for storage concerns.

22 Public Cache Proxy Cache: Web proxy cache, is a function of a proxy server that caches retrieved Web pages on the server's hard disk so that the page can be quickly retrieved by the same or a different user the next time that page is requested. Eases bandwidth requirements and reduces delays that are inherent in a heavily trafficked, Internet-connected network. The proxy cache also is advantageous when browsing multiple pages of the same Web site.

23 Public Cache Gateway Cache : Also known as “reverse proxy caches” or “surrogate caches,” gateway caches are also intermediaries, but instead of being deployed by network administrators to save bandwidth, they’re typically deployed by Webmasters themselves, to make their sites more scalable, reliable and better performing. Requests can be routed to gateway caches by a number of methods, but typically some form of load balancer is used to make one or more of them look like the origin server to clients. A reverse proxy is a proxy server that is installed on a server network or on network equipment. Reverse proxies are used in front of Web servers.

24 Cache processing steps

25 Receiving Parsing Lookup Freshness check
Cache reads the arriving request message from the network Parsing Cache parses the message, extracting the URL and headers Lookup Cache checks if a local copy is available and, if not, fetches a copy (and stores it locally if it is cachable). Freshness check Cache checks if cached copy is fresh enough and, if not, asks server for any updates.

26 Response creation Sending Logging
Cache makes a response message with the new headers and cached body. Sending Cache sends the response back to the client over the network. Logging Optionally, cache creates a log file entry describing the transaction.

27 Cache Processing Flow

28 Revalidate with Server no Fetch from Server
Request Arrives Cached? no yes Fresh Enough? Revalidated ? no Revalidate with Server no Fetch from Server yes yes Update freshness of the cached document Store in Cache Serve to client

29 Determining Freshness

30 The most important step in Cache processing is the Freshness Check.
We want to Use caches but we don’t want to serve Stale Responses So we need to answer two Questions How Do we decide What is fresh and what is Stale? Once we know a entry is stale. How do we revalidate?

31 Time can be used as a Freshness constraint
We need a Mechanism that makes the cached entry expiable by giving determining the lifespan of the entry in terms of time. The Expiration model of HTTP provides this mechanism.

32 Expiration Model

33 The Expiration model is a way for the server to say how long a requested resource stays fresh for, user agents should cache the resource response and re-use it until its cache is no longer fresh There are three main functions that are performed: Assigning Expiration Time to a response. Calculating Age of a response. Calculating Expiration of a response.

34 Age of the response is the time a response has been in the cache.
Life of a response is the time for which it will be considered fresh by the cache Expiration occurs when Age exceeds Life. Once an Entry expires then the cache will have to fetch a fresh entry on the first subsequent request for that resource. This is called an end to end reload.

35 Servers specifies explicit expiration times using:
Expiration Time determines life of the response. It is assigned to the cached entries by following two methods: Server-Specified Expiration: Origin server provides an explicit expiration time in the future. Heuristic Expiration: Cache calculates Expiration time. Servers specifies explicit expiration times using: Expires header max-age directive of the of the Cache-Control header Heuristic Expiration employs algorithms that use other header values (such as the Last-Modified time) to estimate a plausible expiration time.

36 The expiration mechanism applies only to responses taken from a cache and not to first-hand responses forwarded immediately to the requesting client.

37 Revalidation

38 Like The expiration model HTTP provides a standard mechanism to handle Revalidations as well.
It is known as the Validation Model It uses Validators and Conditional Requests as the tools for revalidation.

39 Validation Model

40 The validation model is the mechanism for a client asking the server whether a cached version of a resource is still fresh. Validation is done when the cache wants to use a stale entry to respond to a client’s request. A cache cannot do a conditional retrieval if it does not have a validator for the entity, which means it will not be refreshable after it expires.

41 When an origin server generates a full response, it attaches some sort of validator to it, which is kept with the cache entry. When a client (user agent or proxy cache) makes a conditional get request for a resource for which it has a cache entry, it includes the associated validator in the request. A conditional GET request is the one which includes an If-Modified-Since, If-Unmodified-Since, If-Match, If-None-Match, or If-Range header field.

42 The server then checks that validator against the current validator for the entity
If they match, it responds with a special status code (usually, 304 (Not Modified)) and no entity-body. Otherwise, it returns a full response (including entity-body) Benefits We don’t have to pay the overhead of retransmitting the full response if the cached entry is good We do not have to pay the overhead of an extra round trip if the cached entry is invalid. It’s faster than a Cache miss

43 Validators

44 In some cases both are used
A Validator is a header whose value can be compared with similar type of Header Values in order to Validate the response. For Caching we use the following validators: Last-Modified Dates Entity Tag In some cases both are used

45 Last-Modified: A cache entry is considered to be valid if the entity has not been modified since the Last-Modified value. Entity Tag: Entity tags are opaque validators associated with a resource that usually change when the resource changes. On the basis of changes to entity affecting the associated Validator a validator can be classifiled as: Weak Strong

46 Strong Validators Weak Validators
There might be cases when a server prefers to change the validator only on semantically significant changes, and not when insignificant aspects of the entity change. A validator that does not always change when the resource changes is a "weak validator." Weak validators are only usable in contexts that do not depend on exact equality of an entity. Weak validator is part of an identifier for a set of semantically equivalent entities. Since both origin servers & caches will compare two validators to decide if they represent the same or different entities, one normally would expect that if the entity changes in any way, then the associated validator would change as well. This validator is called a "strong validator." Strong validators are usable in any context. Strong validator as part of an identifier for a specific entity

47 Cache Headers

48 The following header fields are used in caching:
Last-modified If-modified-since If-unmodified-since Etag Vary Age Pragma directive Date Expires Cache-Control

49 Last-modified (Response)
Indicates the date and time at which the origin server believes the variant was last modified. Last modified is implicitly a weak validator. Last-Modified: Tue, 15 Nov :45:26 GMT If-Modified-Since (Request) If the requested variant has not been modified since the time specified in this field An entity will not be returned from the server Instead , a 304 (not modified) response will be returned without any message body. If-Modified-Since: Sat, 12 Oct :43:31 GMT

50 If-Unmodified-Since (Request)
It’s a request Header Field. If the requested resource has not been modified since the time specified in this field, the server SHOULD perform the requested operation as if the If-Unmodified-Since header here not present. Otherwise, it must return 412. This is rarely used If-Unmodified-Since: Sat, 29 Oct :43:31 GMT

51 Many Etag header and within in one many values
Etag (Response, HTTP 1.1) Many Etag header and within in one many values An entity tag, provides for an “opaque” cache validator. This might allow more reliable validation in situations where it is inconvenient to store modification dates Where the one-second resolution of HTTP date values is not sufficient Where the origin server wishes to avoid certain paradoxes that might arise from the use of modification dates. This field provides the value of the entity tag for the requested variant. ETag: “xyxxy” ETag: W/”xyzzy” ETag: “ “ If both Etag and Last Modified values are present the subsequent Conditional request should use both values

52 Following are used in the request header with entity tags to make the request conditional
If-Match: The method is performed only if there is an entity that matches the entity tag used in this field. If-Match: "xyzzy” If-Match: "xyzzy", "r2d2xxxx", "c3piozzzz” If-Match: *

53 If-None-Match: A client that has one or more entities previously obtained from the resource can verify that none of those entities is current by including a list of their associated entity tags in the If-None-Match header field. If none of the entity tags match, then the server MAY perform the requested method. If-None-Match: "xyzzy“ If-None-Match: W/"xyzzy" If-None-Match: "xyzzy", "r2d2xxxx", "c3piozzzz" If-None-Match: * This is used most of the time.

54 If-Range: If the entity tag given in the If-Range header matches the current Entity tag for the entity, then the server SHOULD provide the specified sub-range of the entity using a 206 (Partial content) response. If the entity tag does not match, then the server SHOULD return the entire entity using a 200 (OK) response. MAY use that date in an If-Range header. Example – both of these are valid If-Range: “df6b0-b4a-3be1b5e1” If-Range: Sat, 29 Oct :43:31 GMT

55 Vary (Response) The Vary response header lists all of the client request headers that the server considers to select the document or generate custom content. If the server uses information other than the headers in the request, such as the client’s IP address, time of the day, user personalization, etc. it uses a Vary header with a value of “*”. The fields in the vary header of a cached response (while fresh) determine whether the cache is permitted to use the same response to reply to a subsequent request without validation. An HTTP/1.1 server SHOULD include a Vary header field with any cacheable response that is subject to server driven negotiation.

56 Age (Response) The Age response-header field conveys the sender's estimate of the amount of time since the response (or its revalidation) was generated at the origin server. A cached response is “fresh” if its age does not exceed its freshness lifetime. Age values are non-negative decimal integers, representing time in seconds. An HTTP/1.1 server that includes a cache MUST include an Age header field in every response generated from its own cache. Age: 3600

57 Pragma Directive The Pragma general-header field is used to include implementation- specific directives that might apply to any recipient along the request/response chain. Pragma = "Pragma" ":" 1#pragma-directive pragma -directive = "no-cache" | extension-pragma extension- pragma = token [ "=" ( token | quoted-string ) When the no-cache directive is present in a request message, an application SHOULD forward the request toward the origin server even if it has a cached copy of what is being requested This is used only for compatibility with HTTP1.0

58 Date The Date general-header field represents the date and time at which the message was originated. A received message that does not have a Date header field MUST be assigned one by the recipient if the message will be cached by that recipient or gatewayed via a protocol which requires a Date. Origin servers MUST include a Date header field in all responses, except few cases. A client without a clock MUST NOT send a Date header field in a request Date: Tue, 15 Nov :12:31 GMT

59 Expires The Expires entity-header field gives the date/time after which the response is considered stale. To mark a response as “already expired,” an origin server sends an Expires date that is equal to the Date header value. To mark a response as “never expires,” an origin server sends an Expires date approximately one year from the time the response is sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one year in the future. Expires: Thu, 01 Dec :00:00 GMT

60 Cache-Control The Cache-Control general-header field is used to specify directives that MUST be obeyed by all caching mechanisms along the request/response chain. The directives specify behavior intended to prevent caches from adversely interfering with the request or response. The Cache-Control directives are either for the request, response or both. HTTP/1.0 caches might not implement Cache-Control.

61 Cache-Control Directives

62 Public (response) Indicates that any response may be cached by any cache. Marks authenticated responses as cacheable; normally, if HTTP authentication is required, responses are automatically private. Private (response) Indicates that all or part of the response message is intended for a single user MUST not be cached by a shared cache (e.g., in a proxy), Could be cached by a private cache (e.g., in a browser).

63 no-cache (request/response)
The server sends this to prevent the cache from serving this response without revalidation. Sometimes, a client wants a reload of the entry from the origin server (this could be because it thinks the expiration time is overestimated by its cache or intermediate caches or the server’s cache. It could also use this directive if the cached copy is corrupted for some reason. Public caches won’t cache this response No-Store (request/response) Must not store the request or response that contains this header.

64 s-maxage (secs) (request/response)
In a response, this overrides the maximum age specified by max-age and expires directives, for a shared cache. It is ignored by a private cache. max-age (secs) (request/response) In a request, it is the maximum age of the response (secs) the client is willing to accept. In a response, it specifies the maximum amount of time that a representation will be considered fresh.

65 Min-fresh (request) Client wants a response that will be fresh for at least min-fresh secs. Max-stale (request) Client is willing to accept a stale (expired) response up to max-stale seconds old. Cache must attach a Warning header to the stale response using warning 110 No-transform (request/response) cache or proxy MUST NOT change any aspect of the entity-body that is specified by these headers, including the value of the entity-body itself.

66 must-revalidate (response)
Cache must do an end-to-end revalidation every time if this directive is in the response received by the cache. If the response is stale based on Expires header or max-age value. This is required to overrule other settings such as max-stale or anything else that ignores expiration time. proxy-revalidate (response) Same as must-revalidate, but applies to proxy (not to non shared user agent caches).

67 Only-if-cached (request)
If the Client wants the cache to return only to return only those responses that it currently has stored, and not to reload or revalidate with the origin server If the response is not cached the cache responds with 504 –gateway timeout status

68 Cachability

69 Cacheable Response Codes
Description Explanation 200 Ok success 203 Non-authoritative information Same as 200, but sender has reason to believe that the entity headers are different from those the origin server would send 206 Partial content Similar to 200, but response to a “range” request. Cacheable if the cache supports range requests. 300 Multiple choices Response includes choices from which user could make a selection 301 Moved permanently New URL is in the response headers 410 Gone Requested resource moved permanently from origin server

70 Cacheable Request Methods
GET Yes, by default POST Uncachable by default, cacheable if Cache-control headers allow HEAD May be used to cache prev updated entry PUT No DELETE OPTIONS TRACE

71 Caching Scenarios

72 Scenario 1 Browser Caches a Response

73 Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Cache-Control: max-age=60 Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk

74 Client Cache Hit GET /foo/index.html HTTP/1.1 Host: Client Server Internet Client Receives copy of from its hard disk

75 Cached Entry Expires GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Cache-Control: max-age=60 Content-Length: 3688 Content-Type: text/html Cache does an End to end reload and stores copy of on its hard disk

76 Scenario 2 No expiry in the response Client used heuristic model to calculate expiry time

77 Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Client Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul :52:37 GMT Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk and calculates expiry based on heuristic model

78 Revalidation on expiry (Revalidate Hit)
GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul :52:37 GMT Client Server Internet HTTP/ Not Modified Response Revalidated the cache will Refresh the cache entry and serve response from local cache

79 Revalidation on expiry (Revalidate Miss)
GET /foo/index.html HTTP/1.1 Host: If-Modified-Since: Fri, 23 Jul :52:37 GMT Client Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul :55:37 GMT Content-Length: 3688 Content-Type: text/html Response Revalidated the cache will Refresh the cache entry and serve response from local cache

80 Scenario 3 Revalidation With Etags

81 Client Caches a Response
GET /foo/index.html HTTP/1.1 Host: Date: Fri, 23 Jul :50:37 GMT Server Client Internet HTTP/ OK Date: Fri, 23 Jul :50:57 GMT Expires: Fri, 23 Jul :51:37 GMT Etag: “CZSSOOOO1000” Content-Length: 3688 Content-Type: text/html Client stores copy of on its hard disk

82 Revalidation on expiry (Revalidate Hit)
GET /foo/index.html HTTP/1.1 Host: Date: Fri, 23 Jul :52:37 GMT If-None-Match: “CZSSOOOO1000” Client Server Internet HTTP/ Not Modified Date: Fri, 23 Jul :52:40 GMT Response Revalidated the cache will Refresh the cache entry and serves response from local cache

83 Scenario 4 Caching a Response in Proxy Cache

84 Proxy Cache Miss Client Proxy Server Internet
GET /foo/index.html HTTP/1.1 Host: GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html HTTP/ OK Age: 0 Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html

85 Proxy Cache Hit Client Proxy Server Internet
GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Age: 60 Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html

86 Scenario 5 Proxy ignores Private Response

87 Proxy Ignores Private Response
GET /foo/index.html HTTP/1.1 Host: GET /foo/index.html HTTP/1.1 Host: Client Proxy Server Internet HTTP/ OK Last-Modified: Fri, 23 Jul ... Cache-Control: Private Content-Length: 3688 Content-Type: text/html HTTP/ OK Cache-Control: Private Last-Modified: Fri, 23 Jul ... Content-Length: 3688 Content-Type: text/html Proxy will not store copy of Client can store a copy of

88 Bibliography HTTPTutorial (Taken from Training folder of CVS Docs)
RFC 2616 in general and Chapter 13 Caching in HTTP in more details Chapter 7 – Caching of Book - HTTP – The definitive guide by O’RIELLY Caching tutorial - betterexplained.co/articles /how-to-optimize-your-site-with- http-caching/

89 Queries ?

90 Thanks!


Download ppt "A tutorial on Web Caching"

Similar presentations


Ads by Google