Presentation on theme: "CS193H: High Performance Web Sites Lecture 16: Rule 13 – Configure ETags Steve Souders Google"— Presentation transcript:
CS193H: High Performance Web Sites Lecture 16: Rule 13 – Configure ETags Steve Souders Google email@example.com
announcements 11/17 – guest lecturer: Robert Johnson (Facebook), "Fast Data at Massive Scale - lessons learned at Facebook"
GET /v-app/scripts/107652916-dom.common.js HTTP/1.1 Host: www.blogger.com User-Agent: Mozilla/5.0 (…) Gecko/2008070208 Firefox/3.0.1 Accept-Encoding: gzip,deflate If-Modified-Since: Mon, 22 Sep 2008 21:14:35 GMT HTTP/1.1 304 Not Modified Conditional GET (IMS) IMS determines validity – does the browser's cached version match what's on the server? the comparison is based on the resource's date a 304 response is sent instead of all the data IMS is used when Reload is pressed sometime after 3pm PT 9/24/08:
GET /v-app/scripts/107652916-dom.common.js HTTP/1.1 Host: www.blogger.com User-Agent: Mozilla/5.0 (…) Gecko/2008070208 Firefox/3.0.1 Accept-Encoding: gzip,deflate If-Modified-Since: Mon, 22 Sep 2008 21:14:35 GMT If-None-Match: "19f1e-7920-4525b037f0440" HTTP/1.1 304 Not Modified Conditional GET (INM) alternative way to test validity sometime after 3pm PT 9/24/08:
What is an ETag http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.11 added in HTTP/1.1 used by clients and servers to validate expired resources more flexible than Last-Modified date "An entity tag consists of an opaque quoted string" " An entity tag MUST be unique across all versions of all entities associated with a particular resource."
If-None-Match (hit) "If any of the entity tags match the entity tag of the entity that would have been returned in the response to a similar GET request (without the If- None-Match header) on that resource[…], then the server MUST NOT perform the requested method, unless required to do so because the resource's modification date fails to match that supplied in an If-Modified-Since header field in the request. Instead, if the request method was GET or HEAD, the server SHOULD respond with a 304 (Not Modified) response,…" http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.26
INM, IMS hit & miss hitmiss hit304full response miss If-Modified- Since If-None- Match
If-None-Match (miss) If none of the entity tags match, then the server MAY perform the requested method as if the If-None-Match header field did not exist, but MUST also ignore any If-Modified-Since header field(s) in the request. That is, if no entity tags match, then the server MUST NOT return a 304 (Not Modified) response.
INM, IMS hit & miss hitmiss hit304full response missfull response If-Modified- Since If-None- Match if not managed properly, sending both IMS and INM lowers the chances of a simple, small 304 response How could it not be managed properly?!
Apache ETags "19f1e-7920-4525b037f0440" "inode-size-timestamp" inode – used by filesystems to store file type, owner, group, permissions, etc. inode for the same file differs across servers even if file size, timestamp, and directory is the same http://stevesouders.com/images/arrow-right-9x13.png ETag: "21f5315-d4-5d51f0c0" http://1.cuzillion.com/images/arrow-right-9x13.png ETag: "1ee57ec-d4-5d51f0c0"
IIS ETags "b4f35327edac51:113f" "timestamp:changenumber" changenumber – counter to track IIS configuration changes changenumber rarely the same across servers http://hp.msn.com/global/c/hpv10/favicon.ico ETag: "b4f35327edac51:113f" ETag: "b4f35327edac51:e6e"
example ETag miss GET /global/c/hpv10/favicon.ico HTTP/1.1 Host: hp.msn.com If-Modified-Since: Wed, 26 Oct 2005 22:39:58 GMT If-None-Match: "b4f35327edac51:19bc" HTTP/1.x 200 OK Content-Length: 1406 Etag: "b4f35327edac51:d76" Last-Modified: Wed, 26 Oct 2005 22:39:58 GMT Expires: Wed, 06 Feb 2008 01:10:16 GMT timestamp is the same Last-Modified matches (but IMS misses) changenumber differs, validations misses, entire body is resent validation miss
the problem with ETags the default ETag syntax in Apache and IIS makes it unlikely that INM will match across servers, even when the resource is the same probability of an incorrect INM miss: (n-1)/n where "n" is the number of servers not an issue if you just have one server http://www.apacheweek.com/issues/02-01-18 "can cause an unnecessary performance hit as resources are fetched more often than is required" http://support.microsoft.com/kb/922703 "IIS 6.0 sends a 200 response because it considers the different change numbers to mean that [the resources] are not the same versions"
the solution for ETags if you're not leveraging ETags, turn them off reduces size of requests and responses reduces outbound traffic from your servers increases proxy cache hit rate Apache: FileETag none IIS: synchronize changenumber across servers http://support.microsoft.com/kb/922703/
ETags in the wild serverETags? default syntax? www.aol.comAOLserverno – www.ebay.comIISyes www.facebook.comApacheno – www.google.com/searchgwsno – search.live.com/resultsASP.NETyesno www.msn.comIISno – www.myspace.comApachesomeno en.wikipedia.org/wiki Apache lighthttpd some yes no ? www.yahoo.comYTSno – www.youtube.combtfeno –
Homework 11/7 11:59pm – rules 4-10 applied to your "Improving a Top Site" class project 11/12 3:15pm – Web 100 Double Check look at your rows in Web 100 spreadsheet double-check your entries for any rows in red update incorrect entries enter "y" in "Double Checked" column read HPWS Chapter 14
Questions Why were ETags introduced in HTTP/1.1? What do "IMS" and "INM" stand for? How do IMS and INM interplay during resource validation? What's the default syntax for ETags in Apache and IIS? What component in each default syntax hurts performance, and why? What are three performance gains you can achieve by turning off ETags?