HyperText Transfer Protocol

Slides:



Advertisements
Similar presentations
1 Caching in HTTP Representation and Management of Data on the Internet.
Advertisements

HTTP – HyperText Transfer Protocol
Web basics HTTP – – URI/L/Ns – HTML –
1 HTTP – HyperText Transfer Protocol Part 1. 2 Common Protocols In order for two remote machines to “ understand ” each other they should –‘‘ speak the.
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
HTTP HyperText Transfer Protocol Part 2.
HTTP Headers. Read these slides yourselves This set of slides explains the header fields which are pre-defined in HTTP/1.1 Read these slides yourselves.
What’s a Web Cache? Why do people use them? Web cache location Web cache purpose There are two main reasons that Web cache are used:  to reduce latency.
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
HTTP HyperText Transfer Protocol.
1 HTTP - Hypertext Transfer Protocol Arthur : Yigal Eliaspur Date :
Web, HTTP and Web Caching
HTTP HyperText Transfer Protocol Part 3.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
The abs_path in a URI If the abs_path is not present in the URL, it must be given as "/" in a Request-URI for a resource. Thus, if a user points a browser.
Hypertext Transport Protocol CS Dick Steflik.
Nikolay Kostov Telerik Corporation
 What is it ? What is it ?  URI,URN,URL URI,URN,URL  HTTP – methods HTTP – methods  HTTP Request Packets HTTP Request Packets  HTTP Request Headers.
Rensselaer Polytechnic Institute CSC-432 – Operating Systems David Goldschmidt, Ph.D.
Introduction to Web Programming Fall 2014/2015 Some slides are based upon Web Technologies course slides, HUJI, 2009 Extended System Programming Laboratory.
COMP3016 Web Technologies Introduction and Discussion What is the Web?
Krerk Piromsopa. Web Caching Krerk Piromsopa. Department of Computer Engineering. Chulalongkorn University.
HTTP Protocol Specification
Web Caching: Replication on the World Wide Web Jonathan Bulava CSC8530 – Distributed Systems Dr. Paul Schragger.
FTP (File Transfer Protocol) & Telnet
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
HyperText Transfer Protocol (HTTP).  HTTP is the protocol that supports communication between web browsers and web servers.  A “Web Server” is a HTTP.
1 Lecture #7-8 HTTP – HyperText Transfer Protocol HAIT Summer 2005 Shimrit Tzur-David.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Application Layer 2 Figures from Kurose and Ross
Rensselaer Polytechnic Institute Shivkumar Kalvanaraman, Biplab Sikdar 1 The Web: the http protocol http: hypertext transfer protocol Web’s application.
Maryam Elahi University of Calgary – CPSC 441.  HTTP stands for Hypertext Transfer Protocol.  Used to deliver virtually all files and other data (collectively.
Sistem Jaringan dan Komunikasi Data #9. DNS The Internet Directory Service  the Domain Name Service (DNS) provides mapping between host name & IP address.
©Yaron Kanza HTTP Written by Dr. Yaron Kanza, Edited with permission from author by Liron Blecher.
Web Server Design Week 8 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 3/3/10.
HyperText Transfer Protocol (HTTP) RICHI GUPTA CISC 856: TCP/IP and Upper Layer Protocols Fall 2007 Thanks to Dr. Amer, UDEL for some of the slides used.
HTTP1 Hypertext Transfer Protocol (HTTP) After this lecture, you should be able to:  Know how Web Browsers and Web Servers communicate via HTTP Protocol.
Web Server Design Week 4 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/03/10.
1 Caching in HTTP Representation and Management of Data on the Internet.
CIS679: Lecture 13 r Review of Last Lecture r More on HTTP.
Web Cache Consistency. “Requirements of performance, availability, and disconnected operation require us to relax the goal of semantic transparency.”
HTTP Protocol Design1 HTTP - timeline r Mar 1990 CERN labs document proposing Web r Jan 1992 HTTP/0.9 specification r Dec 1992 Proposal to add MIME to.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
Web Server Design Week 7 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/24/10.
HTTP Here, we examine the hypertext transfer protocol (http) – originally introduced around 1990 but not standardized until 1997 (version 1.0) – protocol.
Web Services. 2 Internet Collection of physically interconnected computers. Messages decomposed into packets. Packets transmitted from source to destination.
EE 122: Lecture 21 (HyperText Transfer Protocol - HTTP) Ion Stoica Nov 20, 2001 (*)
Overview of Servlets and JSP
Web Server Design Week 6 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/17/10.
COMP2322 Lab 2 HTTP Steven Lee Jan. 29, HTTP Hypertext Transfer Protocol Web’s application layer protocol Client/server model – Client (browser):
HTTP HyperText Transfer Protocol.
Web Protocols: HTTP COMP6017 Topics on Web Services Dr Nicholas Gibbins –
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
INTRODUCTION Dr Mohd Soperi Mohd Zahid Semester /16.
© Janice Regan, CMPT 128, Jan 2007 CMPT 371 Data Communications and Networking HTTP 0.
HTTP Protocol Amanda Burrows. HTTP Protocol The HTTP protocol is used to send HTML documents through the Internet. The HTTP protocol sends the HTML documents.
What’s Really Happening
Hypertext Transfer Protocol (HTTP) COMP6218 Web Architecture Dr Nicholas Gibbins –
Hypertext Transfer Protocol
Tiny http client and server
How HTTP Works Made by Manish Kushwaha.
HyperText Transfer Protocol
HTTP – An overview.
HTTP request message: general format
HTTP Headers.
Hypertext Transport Protocol
Web Caching? Web Caching:.
EE 122: HyperText Transfer Protocol (HTTP)
Web Server Design Week 5 Old Dominion University
HTTP Hypertext Transfer Protocol
Presentation transcript:

HyperText Transfer Protocol HTTP HyperText Transfer Protocol 2007 cs236607

Transfer of resources is using HTTP The World-Wide Web Browser Browser HTML Server HTML CSS Server JS CSS JS Transfer of resources is using HTTP 2007 cs236607

Browser-HTTPD Interaction index.html Web Server user requests http:// www.google.com host www.google.com Browser Files 2007 cs 236607

Establishes a TCP Connection Receives an HTTP Response The Browser How? Gets an IP Address To which port? Establishes a TCP Connection Web Server Sends an HTTP Request Receives an HTTP Response Presents a Page Can it present the page now? 2007 cs236607

Establishes a TCP Connection Receives an HTTP Request The Server To what? Listens Establishes a TCP Connection Web Server Receives an HTTP Request Sends an HTTP Response ??? Is that all? 2007 cs236607

Universal Resource Location protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters protocol://host:port/path#anchor?parameters http://www.cs.technion.ac.il/~cs236607/index.html http://www.google.com/search?hl=en&q=blabla Parameters appear in URLs of dynamic pages Are URLs good identifiers? Can they be used as keys of resources? 2007 cs236607

URL, URN and URI URL is Universal Resource Location URN is Universal Resource Name Independent of a specific location, e.g., urn:ietf:rfc:3187 URI is either a URN or a URL There are many possible formats to URI’s mailto:<account@site> news:<newsgroup-name> http://www.cs.technion.ac.il/~cs236607#key123456 2007 cs236607

Terminology Web Server is an implementation of an HTTP Daemon (either HTTP/1.0 or HTTP/1.1) User Agent (UA) is a client (e.g., browser) Origin Server is the server that has the resource that is requested by a client Proxy acts on behalf of a client Reverse Proxy acts on behalf of a server 2007 cs236607

Proxy Servers Sometimes, a browser sends its request via a proxy? The goals: Improve Web traffic Add anonymity How does the proxy affects HTTP message exchange? How does it change messages? Can the browser affect the behavior of the proxy? Can the Web server affect the behavior of the proxy? 2007 cs236607

HTTP Request Proxy Server HTTP Request HTTP Response HTTP Response http://www.google.com HTTP Response Web Server www.google.com:80 The proxy can serve the resource from its own cache, if it is there, without sending the request to the origin server File System 2007 cs236607

Department Proxy Server Proxy Caches reduce latency for a given user agent if they can serve the request from their cache. As a result, they also save bandwidth and reduce the load on the origin server. Technion Proxy Server Therefore, they reduce latency also for requests that must be sent to the origin server. Israel Proxy Server Web Server www.google.com:80 2007 cs236607

Main Features of HTTP Stateless Persistent connection (in HTTP/1.1) Pipelining (in HTTP/1.1) Caching (improved in HTTP/1.1) Compression negotiation (improved in 1.1) Content negotiation (improved in 1.1) Interoperability of HTTP/1.0 and HTTP/1.1 2007 cs236607

Requests and Responses A UA sends a request and gets back a response Requests and responses have headers HTTP 1.0 defines 16 headers None is required HTTP 1.1 defines 46 headers The Host header is required in all requests 2007 cs236607

Hop-by-Hop vs. End-to-End HTTP requests and responses may travel between the UA and the origin server through a series of proxies Thus, in an HTTP connection there is a distinction between Hop-by-Hop, and End-to-End Some headers are hop-by-hop and some are end-to-end (in HTTP/1.1) Each hop is a separate TCP connection 2007 cs236607

How is the Chain of Proxies Discovered? A browser sends requests to the proxy that is specified in the browser settings Alternatively, Web proxies can be automatically discovered, for example the router redirects all HTTP requests to the proxy (“transparent caching”) Each proxy knows the address of the next proxy along the way to the origin server 2007 cs236607

Interoperability Even if the UA and the origin server comply with HTTP/1.1, some proxies along the way may only comply with HTTP/1.0 The design of HTTP/1.1 had to take it into account We will point out features of HTTP/1.1 that were introduced to ensure interoperability with HTTP/1.0 How can HTTP support both backward (to the past) and forward (to the future) interoperability? 2007 cs236607

Note HTTP (both 1.0 and 1.1) has always specified that an implementation should ignore a header that it does not understand The header should not be deleted – just ignored! This rule allows extensions by means of new headers, without any changes in existing specifications 2007 cs236607

Requests 2007 cs236607

The Format of a Request Entity (Message Body( lines method sp URI version cr lf header : value cr lf lines cr lf Entity (Message Body( The URI is specified without the host name, unless the request is sent to a proxy 2007 cs236607

An Example of a Request method request URI GET /index.html HTTP/1.1 Accept: image/gif, image/jpeg User-Agent: Mozilla/4.0 Host: www.cs.technion.ac.il:80 Connection: Keep-Alive [blank line here] version headers 2007 cs236607

2007 cs236607

Common Request Methods GET returns the content of a resource HEAD only returns the headers POST sends data to the given URI OPTIONS requests information about the communication options available for the given URI, such as supported content types * instead of a URI requests information that applies to the given Web server in general OPTIONS is not fully specified 2007 cs236607

Additional Request Methods PUT replaces the content of the given URI or generates a new resource at the given URI if none exists DELETE deletes the resource at the given URI TRACE invokes a remote loop-back of the request The final recipient should reflect the message back to the client CONNECT switches the proxy to become a tunnel Do servers really support PUT or DELETE? 2007 cs236607

Range and Conditional Requests (Usually GET) Range requests are requests with the Range header (only in HTTP/1.1) Conditional requests are related to caching and they use the following headers (some only in HTTP/1.1) If-Unmodified-Since If-Modified-Since If-Match If-None-Match If-Range 2007 cs236607

Where Do Request Headers Come From? The UA sends headers with each request The user may determine some of these headers through the browser configuration Proxies along the way may add their own headers and delete existing (hop-by-hop) headers 2007 cs236607

The Host Header in Requests (It is Required in HTTP/1.1 but not in HTTP/1.0) 2007 cs236607

Why is the Host Header Required in HTTP/1.1? If the URL is http://www.example.com/home.html, then the HTTP/1.0 syntax is GET /home.html HTTP/1.0 and the TCP connection is to port 80 at the IP address corresponding to www.example.com Why is the Host Header Required in HTTP/1.1? 2007 cs236607

Why is the Host Header Required in HTTP/1.1? In HTTP/1.0, there can be at most one HTTP server per IP address This wastes IP addresses, since companies like to use many “vanity URLs” (that is, URLs that only consist of hostnames) In HTTP/1.1, requests to different HTTP servers can be sent to port 80 at the same IP address, since each request contains the host name in the Host header Why is the Hostname not in the URL? 2007 cs236607

Why is the Hostname not in the URL? To ensure interoperability with HTTP/1.0 An HTTP/1.0 server will incorrectly process a request that has an absolute URL (i.e., a URL that includes the hostname) An HTTP/1.1 must reject any HTTP/1.1 (but not HTTP/1.0) request that does not have the Host header 2007 cs236607

Responses 2007 cs236607

The Format of a Response version sp status code phrase cr lf status line header : value cr lf lines cr lf Entity (Message Body) 2007 cs236607

An Example of a Response version status code status phrase HTTP/1.0 200 OK Date: Fri, 31 Dec 1999 23:59:59 GMT Content-Type: text/html Content-Length: 1354 <html> <body> <h1>Hello World</h1> (more file contents) . . . </body> </html> headers message body 2007 cs236607

2007 cs236607

Status Codes in Responses The status code is a three-digit integer, and the first digit identifies the general category of response: 1xx indicates an informational message 2xx indicates success of some kind 3xx redirects the client to another URL 4xx indicates an error on the client's part Yes, the system blames it on the client if a resource is not found (i.e., 404) 5xx indicates an error on the server's part 2007 cs236607

Where Do Response Headers Come From? The Web server, based on its settings, determines some headers Applications that create dynamic pages may add additional headers Proxies along the way may add their own headers and delete existing (hop-by-hop) headers 2007 cs236607

Where Do Status Codes Come From? Web servers and applications creating dynamic pages determine status codes It is important to configure Web servers and write applications creating dynamic pages so that they will return correct, meaningful and useful status codes and headers 2007 cs236607

Apache HTTP Server Apache lets each user put an .htaccess file in her www directory The .htaccess file applies to all subdirectories as well, unless it is overridden by .htaccess files in those subdirectories The .htaccess file may contain commands that add headers to responses (as well as commands that do other things) 2007 cs236607

Tomcat Tomcat is a simple web server that we will use in this course In Tomcat, configuration of HTTP response headers is in the server.xml file 2007 cs236607

Setting HTTP Headers for Dynamically Generated Content Headers can be set by using appropriate methods, e.g., myServlet.setContentType(…) myServlet.setContentLength(…) 2007 cs236607

META HTTP-EQUIV Tags The browser interprets these tags as if they were headers in the HTTP response For example <META HTTP-EQUIV=“Refresh” CONTENT=“5; URL=http://host/path/”> If the value is 0 (instead of 5) and there is no URL parameter, the same page is continuously refreshed, causing the Back button to stop working 2007 cs236607

META HTTP-EQUIV Tags are Only Read by Browsers META HTTP-EQUIV tags are interpreted by browsers Proxies usually don’t read the HTML documents – they only read the headers of the HTTP requests and responses Therefore, cache-control headers in META HTTP-EQUIV tags actually apply only to the browser’s cache 2007 cs236607

Examples 2007 cs236607

[kanza@csa ~]$ telnet www.cs.technion.ac.il 80 Trying 132.68.32.15... Connected to csn.cs.technion.ac.il (132.68.32.15). Escape character is '^]'. GET /~kanza/test.html HTTP/1.0 HTTP/1.1 200 OK Date: Wed, 16 Jan 2008 00:10:20 GMT Server: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/5.0.4 DAV/2 mod_perl/1.999.21 Perl/v5.8.6 Last-Modified: Wed, 16 Jan 2008 00:07:33 GMT ETag: "9a42e-79-53ebbb40" Accept-Ranges: bytes Content-Length: 121 Connection: close Content-Type: text/html <html> <head> <title>Test for cs236607</title> </head> <body> This page is being used for testing HTTP. </body> </html> Connection closed by foreign host. [kanza@csa ~]$ 2007 cs236607

[kanza@csa ~]$ telnet www.cs.technion.ac.il 80 Trying 132.68.32.15... Connected to csn.cs.technion.ac.il (132.68.32.15). Escape character is '^]'. GET /~kanza/test.html HTTP/1.1 Host: www.cs.technion.ac.il HTTP/1.1 200 OK Date: Wed, 16 Jan 2008 00:28:48 GMT Server: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/5.0.4 DAV/2 mod_perl/1.999.21 Perl/v5.8.6 Last-Modified: Wed, 16 Jan 2008 00:07:33 GMT ETag: "9a42e-79-53ebbb40" Accept-Ranges: bytes Content-Length: 121 Content-Type: text/html <html> <head> <title>Test for cs236607</title> </head> <body> This page is being used for testing HTTP. </body> </html> Connection closed by foreign host. [kanza@csa ~]$ 2007 cs236607

[kanza@csa ~]$ telnet www.cs.technion.ac.il 80 Trying 132.68.32.15... Connected to csn.cs.technion.ac.il (132.68.32.15). Escape character is '^]'. GET /~kanza/test.html HTTP/1.1 HTTP/1.1 400 Bad Request Date: Wed, 16 Jan 2008 00:31:20 GMT Server: Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/5.0.4 DAV/2 mod_perl/1.999.21 Perl/v5.8.6 Content-Length: 387 Connection: close Content-Type: text/html; charset=iso-8859-1 <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN"> <html><head> <title>400 Bad Request</title> </head><body> <h1>Bad Request</h1> <p>Your browser sent a request that this server could not understand.<br /> </p> <hr> <address>Apache/2.0.54 (Unix) mod_ssl/2.0.54 OpenSSL/0.9.7g PHP/5.0.4 DAV/2 mod_perl/1.999.21 Perl/v5.8.6 Server at www.cs.technion.ac.il Port 80</address> </body></html> Connection closed by foreign host. [kanza@csa ~]$ 2007 cs236607

Persistent Connections and Pipelining HTTP/1.1 Supports Both 2007 cs236607

Nesting in Page What we see on the browser can be a combination of several resources HTML Code Images Style Sheet … What is wrong with a naïve retrieval of the resources? How can we improve the efficiency of presenting a page? 2007 cs236607

The faculty’s homepage requires seven HTTP requests HttpWatch 2007 cs236607

The Problem Typically, each resource consists of several files, rather than just one Each file requires a separate HTTP request HTTP/1.0 requires opening a new TCP connection for each request TCP has a slow start and therefore, opening a series of new connections is inefficient 2007 cs236607

Persistent Connections are the Default in HTTP/1.1 In HTTP/1.1, several requests can be sent on the same TCP connection The slow-start overhead is incurred only once per resource A connection is closed if it remains idle for a certain amount of time Alternatively, the server may decide to close it after sending the response If so, the response should include the header Connection: close 2007 cs236607

Pipelining When the connection is persistent, the next request can be sent before receiving the response to the previous request Actually, a client can send many requests before receiving the first response Performance can be greatly improved No need to wait for network round-trips 2007 cs236607

Best-Possible Use of TCP A Client sends requests in some given order TCP guarantees that the requests are received in the order that they were sent The server sends responses in the order that it received the corresponding requests TCP guarantees that responses are received in the order that they were sent Thus, the client knows how to associate the responses with its requests 2007 cs236607

But a TCP Connection is Just a Byte Stream So, how does the client know where one response ends and another begins? Parsing is inefficient and anyhow will not work (why?) The server must add the Content-Length header to the response or else it must close the connection after sending the response Will it work for dynamic pages? 2007 cs236607

Sending Dynamic Pages A server has to buffer a whole dynamic page to know its length (and only then the server can send the page) The latency is increased Alternatively, the server can break an entity into chunks of arbitrary length and send these chunks in a series of responses Only one chunk at-a-time has to be buffered 2007 cs236607

Chunked Transfer Encoding Each chunk is sent in a separate message that includes the header Transfer-Encoding: Chunked and also includes the length of the chunk in the Content-Length header A zero-length chunk marks the end of the message 2007 cs236607

Trailers If an entity is sent in chunks, some header values can be computed only after the whole entity has been sent The first chunk includes a Trailer header that lists all the headers that are deferred until the trailer A server cannot send a trailer unless the information is purely optional, or the client has sent the header TE: trailers 2007 cs236607

The Content-Length Header in Requests The Content-Length header is also applicable to POST and PUT requests 2007 cs236607

More on the Connection Header The Connection header may contain connection tokes, e.g., close (discussed earlier) This header also lists all the hop-by-hop headers, thereby telling the recipient that all these headers must be removed before forwarding the message 2007 cs236607

Interoperability Rule in HTTP/1.1 If a Connection header is received in an HTTP/1.0 message, it means that it was incorrectly forwarded by an HTTP/1.0 proxy Therefore, all the headers it lists were incorrectly forwarded and must be ignored 2007 cs236607

Caching in HTTP 2007 cs236607

Type of Web Caches Browser Caches Proxy Caches A portion of the hard disk is used to store representations of resources that have already been displayed If a resource is requested again (for example, by hitting the “back” button), the request is served from the browser cache Proxy Caches These are shared caches – they serve many users 2007 cs236607

Proxy Caches GET /fruit/apple.gif GET /fruit/apple.gif client server GET /fruit/apple.gif proxy server client GET /fruit/apple.gif GET /fruit/apple.gif We can generalise the caching concept and insert a proxy server to cache objects of multiple clients. The client is configured to open all its TCP connections to the proxy. Recall that the http request is GET /fruit/apple.gif HTTP/1.1 Host: www.fruitbook.edu The request contains the full url. The proxy checks to see if it has the object. If not it opens a tcp connection to the host and acts as a client fetching the object. The object is both stored and returned to the original client. A second request made while the object is cached is satisfied without going to the server - but a conditional get is done to confirm the object is current. Introduces many new problems of cache management. server client 2007 cs236607

Benefit of Caching 24%-32% hit rate is possible, client 10Mbps LAN server client 1.5Mbps Internet R R server 15 req/sec 100Kbits/req Let’s consider a typical set up with a number of clients on a LAN with a 1.5Mbps link to the Internet. They access servers all over the world. Assume the total number of requests is 15/sec and the average object is 100Kbits. The load is 1.5Mbps. This won’t worry the LAN but will sink the access to the Internet. It utilisation will be 1 and delays will be arbitrarily long. Now add a proxy server. Proxies typically achieve between 20% and 60% hit rates. Let’s assume 40%. The load on the LAN doesn’t change, but the load on the access link is reduces .9Mbps allowing satisfactory performance. proxy server 24%-32% hit rate is possible, since many users share the cache and, therefore, there is a large number of shared hits client 2007 cs236607

Reasons for Using Web Caches Web caches reduce latency Since the cache is closer to the client, it takes less time for the client to get the resource and display it Web caches save bandwidth Since a resource has to be brought from the server just once, clients that need this resource consume less bandwidth 2007 cs236607

More Reasons for Using Web Caches Web caches reduce the load on servers (for the same reason that they save bandwidth) Since bandwidth is saved and server load is reduced, the latency is reduced for everyone Web caches give some measure of redundancy 2007 cs236607

For example, how much traffic is saved if the Google icon is not sent back with each search result? 2007 cs236607

Points to Consider When Designing a Web Site Caches can help the Web site to load faster Caches may “hide” the users of the Web site, making it difficult to see who is using the site Caches may serve content that is out of date, or stale Do commercial web sites like caches? 2007 cs236607

Terminology Representations are copies of resources that are stored in caches actually, caches store complete responses, including headers If a request is served from a cache, then it should be semantically transparent, that is, it should be the same as a request that is served from the origin server A representation is fresh if it is identical to the resource that is available at the origin server If it is not identical, then it is stale 2007 cs236607

The Risk in Caching and How to Avoid It Responses might not be semantically transparent The cache should determine that the representation is fresh before sending it to the client If it is not fresh, the cache should forward the request to the origin server or to another cache 2007 cs236607

Caching Improves Latency and Saves Bandwidth in Two Ways In some cases, caching eliminates the need to send requests to the origin server by using an expiration mechanism In other cases, caching eliminates the need to return full responses from the origin server by using a validation mechanism 2007 cs236607

An Example of Using a Validation Mechanism Client: GET /fruit/apple.gif client Server responds with Last-Modified-Date: ... cache Client caches object and last-modified-date Server returns either 304 Not Modified or resource Client sends GET /fruit/apple.gif … If-Modified-Since: … Replication of data plays a key role in all distributed systems, not just the web. Reduces latency by providing a local copy Reduces load on the server by increasing concurrent access Caching is a form of replication. But we now have a problem - how do we keep the replicated (cached) data consistent. In the web context we have a less general problem of ensuring that the client sees only up-to-date information. Handled by HTTP’s conditional GET facility. A message of the form: GET url http/1.1 If-Modified-Since: date Only returns the object if it has been modified. The date sent with the request is just the date provided when the object was originally fetched. The server returns HTTP/1.1 304 Not Modified server 2007 cs236607

Validating an Object If the object is stale (i.e., not fresh), the cache will ask the origin server to validate the object In response, the origin server will either tell the cache that the object has not changed, or send a new copy of the object to the cache 2007 cs236607

Validation Mechanisms If-modified-since last-modified date Cannot be used with dynamic pages ETags can be used for dynamic pages and also when a site cycles through several possible responses 2007 cs236607

Are there Limitations on what to Store in Cache? Should a proxy store in the cache all the responses it ever received? 2007 cs236607

The Following Resources are not Cached The headers of a response tell the cache not to keep the resource The response has no validator (i.e., an Expires value, a Max-Age value, a Last-Modified value or an ETag) The resource is authenticated or secured Furthermore, it is difficult to cache dynamic pages and pages with cookies 2007 cs236607

Fresh Objects Are Served From the Cache An object is fresh in the following cases: The object has an expiry time or other age-controlling directive, and is still within the fresh period The browser cache has already seen the object, and has been set to check for newer versions once a session A proxy cache has received the object recently, and the object was modified relatively long ago (this is a heuristic – see later) 2007 cs236607

The Expires HTTP Header A response may include an Expires header: Expires: Fri, 31 Oct 2008 14:19:41 GMT If an expiry time is not specified, the cache can heuristically estimate the expiry time 2007 cs236607

Expiration Model Section 13.2 of RFC 2616 The Expires header cannot be used correctly if there is a clock skew and the resource is fresh for only a short time The header Cache-Control: Max-Age is used to calculate the freshness lifetime: freshness_lifetime = max_age_value If there is no max-age directive, then freshness_lifetime = expires_value – date_value All the information comes form the origin server; hence, not vulnerable to clock skew 2007 cs236607

Age Calculations (Sec. 13.2.3) When a proxy sends a response that is obtained from its cache, it must calculate (an upper bound on) the age and include it in the Age response header The calculation uses values specified in the headers of the cached message and the proxy’s own clock The calculation adds the resident time + an upper bound on the transmission time to the an upper bound on the received age Is it always a reliable (correct) calculation? What happens if some proxy along the way runs HTTP/1.0? 2007 cs236607

Age Calculations (Sec. 13.2.3) The freshness lifetime (from the previous slide) is compared with the age to determine if the response is still fresh (and, hence, can be sent) 2007 cs236607

A Possible Heuristic If the cache received the object 10 hours after it was last modified, then it can heuristically determine that the expiry time is 1 hour after it has received it In general, add 10% (or some other value) of the interval between the last-modification time (given by the Last-Modified header) and the time it was received 2007 cs236607

The Cache-Control Header (Introduced in HTTP 1.1) The following are possible values for the Cache-Control header in responses max-age=<seconds> Specifies the maximum amount of time that an object will be considered fresh (similar to, but overrides the Expires header) s-maxage=<seconds> Similar to max-age, except that it only applies to proxy (shared) caches 2007 cs236607

More Possible Values for the Cache-Control Header public Document is cacheable even if normal rules say that it shouldn’t be (e.g., authenticated document) private The document is for a single user and can only be stored in private (non-shared) caches no-store (may also appear in requests) The response should never be cached and should not even be stored in a temporary location on a disk (this value is intended to prevent inadvertent copies of sensitive information) 2007 cs236607

More Possible Values for the Cache-Control Header must-revalidate Tell caches that they must obey any freshness information provided with the object (HTTP allows caches to take liberties with the freshness of objects) proxy-revalidate Similar to must-revalidate, except that it only applies to proxy (shared) caches 2007 cs236607

No-Cache Some values of the Cache-Control header are meaningful in either responses or requests no-cache In a response, it means not to use the response again without revalidation (this value can apply to cache directive headers; see Sec. 14.9 of RFC2616) In a request, it means to bring a copy from the origin server (i.e., not to use a cache) 2007 cs236607

More Possible Values for the Cache-Control Header in Requests max-age=<seconds> The response should not be older than the given value max-stale=<seconds> The response could exceed its expiration time by the specified amount min-fresh=<seconds> The response should remain fresh for at least the specified amount of time See Sec. 14.9 of RFC2616 for more details 2007 cs236607

The Pragma Header In a request, the header Pragma: no-cache is the same as Cache-Control: no-cache Don’t use Pragma – its meaning is specified only for requests and it is used just for compatibility with HTTP/1.0 For interoperability, it is safer to set both the Pragma and the Cache-Control response headers to the value no-cache 2007 cs236607

The Reload (Refresh) Button Hitting the reload button in the browser brings a copy from a shared cache, but not necessarily from the origin server There is no 100% guarantee that this is a fresh copy Hitting Shift+Reload brings a 100%-guaranteed fresh copy (i.e., from the origin server) 2007 cs236607

How Can a Client Force a Fresh Copy? A fresh copy is obtained from the origin server if the request includes the following header Cache-Control: no-cache The proxy must revalidate its copy with the origin server if the following header is included in the request Cache-Control: max-age=0 2007 cs236607

Who Adds Cache-Control Headers? The server The configuration of the server determines which cache-control headers are added to responses The author of the page can add headers by means of the .htaccess file (only in the Apache server) The application that generates dynamic pages, e.g., servlets, ASP, PHP 2007 cs236607

Cache-Control in HTTP-EQUIV The author of the page can add, to the document itself, a cache-control header by means of the META HTTP-EQUIV tag <meta http-equiv=“cache-control” content =“no cache”> But usually only the browser interprets this tag Proxies along the way don’t read it, since they don’t read the document 2007 cs236607

Validators A validator is any mechanism that may help in determining whether a copy is fresh or stale A strong validator is, for example, a counter that is incremented whenever the resource is changed A weak validator is, for example, a counter that is incremented only when a significant change is made For example, a weak validator may not change if the only change in the site is the number of visitors … 2007 cs236607

Last-Modified Header The most common validator is the time when the document was last changed, the last-modified time It is given by the Last-Modified header In principle, this header should be included in every response; however, there is no last-modified time for dynamic pages It is a weak validator if an object can change more than once within a one-second interval 2007 cs236607

ETag (Entity Tag) ETag is a strong validator (i.e., a unique identifier) generated by the server It is part of the HTTP/1.1 specification (not available in HTTP/1.0) The specification does not say how to generate it The preferred behavior for an HTTP/1.1 origin server is to send both an ETag header and a Last-Modified header 2007 cs236607

Conditional Requests The conditional headers are If-Modified-Since If-Unmodified-Since If-Match If-None-Match If-Range These headers are used to validate an object (i.e., check with the origin server whether the object has changed) 2007 cs236607

If-Modified-Since Header The If-Modified-Since header is used with a GET request If the requested resource has been modified since the given date, the server returns the resource as it normally would (i.e., the header is ignored) Otherwise, the server returns a 304 Not Modified response, including the Date header, but with no message body HTTP/1.1 304 Not Modified Date: Fri, 31 Dec 1999 23:59:59 GMT [blank line] 2007 cs236607

If-None-Match Header A cache may store several responses for the same URI, each having a different ETag A server may cycle through a set of possible responses The cache sends a request with a list of ETags in the header If-none-match If no ETag on the list matches the resource’s current ETag, the server returns a normal response Otherwise, the server returns a response with 304 (Not Modified) and an ETag header that indicates which cache entry is currently valid 2007 cs236607

If-Unmodified-Since Header The If-Unmodified-Since header can be used with any method If the resource has not been modified since the given date, the server returns the same response as it normally would Otherwise, the server returns a 412 Precondition Failed response HTTP/1.1 412 Precondition Failed [blank line] 2007 cs236607

More on Conditional Requests The following conditional headers are useful in requests that are more complex than just a simple GET request; for example, in range requests If-Unmodified-Since If-Match If-Range 2007 cs236607

The Vary Header A response may depend on some header fields of the request For example, the Accept-Language and the Accept-Charset headers determine the specific response The Vary header in a response lists all the relevant selecting header fields of the request 2007 cs236607

Finding Relevant Cache Entries A cache stores responses using the URI as a key A cache can return a stored response if The URI of the new request matches the URI of stored response The selecting headers of the new request match the selecting header fields in the Vary header of the stored response 2007 cs236607

No Transform Sometimes proxies transform responses (for example, to reduce image size before transmitting over a slow link) Some responses cannot be blindly transformed without losing information The no-transform directive in the Cache-Control header is used to prevent transformations (it applies to both requests and responses) 2007 cs236607

Authentication 2007 cs236607

Does it also allow the user application authenticate the server? Restrict Access Some applications should restrict access to authorized users only IP-address-based Access is permitted only to certain IP addresses Form-based The first page shown to the user is a form that requests for a password HTTP Basic Does it also allow the user application authenticate the server? 2007 cs236607

name;password encoded in Base64 HTTP Basic The user tries to access the page The server response is HTTP/1.1 401 Unauthorized WWW-Authenticate: Basic realm=“Description of the restricted site” The browser pops up a prompt window asking for a user name and password The user input is encoded and sent to the server Authorization: Basic emFjaGFyawFzOMFwcGxcGlCg== If authorization succeeds, resources are sent to the browser name;password encoded in Base64 2007 cs236607

Sessions 2007 cs236607

HTTP is Stateless Theoretically, each request-response is an independent interaction How can we implement an online store Payment and shipment are according to the state of some virtual shopping cart Does persistent connection provide a solution? 2007 cs236607

Sessions A session is a sequence of related interactions between a client and a server A session allows responses to be according to a state A shared state can be shared by several users A session state is a state of a single user A transient state is a refers to a single interaction 2007 cs236607

Implementing Sessions URL Rewriting Hidden Form Fields Cookies 2007 cs236607

Advanced Topics 2007 cs236607

Bandwidth Optimization Range requests Expect and 100 (Continue) Compression 2007 cs236607

Range Requests A range request uses the Range header for specifying the requested portions of a resource A range response is returned with the Content-Range header that specifies the offset and length of the returned range The multipart/byteranges MIME type allows the transmission of multiple ranges in one response 2007 cs236607

When to Use Range Requests To read the initial part of an object For example, if the object is an image, reading the initial part provides the information for doing the layout To complete a response transfer that was interrupted (either by the user or by network failure) To read the tail of a growing object 2007 cs236607

Range Requests and Caching A range response is returned with the status code 206 (Partial Content) This prevents HTTP/1.0 proxies from accidentally treating the response as a full one, and using it later as a cached response 2007 cs236607

Conditional Range Requests To request conditionally the prefix of a resource, the If-None-Match header can be used This happens when the client has a response containing the prefix in its cache, and the client wants to validate that response 2007 cs236607

The If-Range Header Sometimes the client’s cache may have the object, but without the requested range Hence, the client sends a range request The server should return the requested range if the object has not changed Otherwise, the server should send back a full response 2007 cs236607

The Clients Wants the Range only if the Object has not Changed The client sends a range request with the If-Match header The server returns the the range (i.e., normal) response if the object has not changed Otherwise, the server returns 412 (Precondition Failed) and the client should send a new request for the full object Two requests might be needed The If-Range header does the above interaction in one request 2007 cs236607

Expect and 100 (Continue) A request (e.g., POST) may contain a large object Sometimes there is no need to send the object to find out that the request fails For example, if the client lacks authorization, or the server is too busy In HTTP/1.1, the client can send just the headers and wait for the server’s indications that it can also send the object 2007 cs236607

The Expect Header The client must include the new header Expect: 100 with the rest of the headers that it initially sends (why?) The server should respond with the status code 100 (Continue), or with the usual status code if it cannot handle the request HTTP/1.1 has some rules for avoiding infinite waits by clients or wasted bandwidth 2007 cs236607

Compression HTTP/1.1 makes a clear distinction between end-to-end encoding (the Content-Encoding response header) and hop-by-hop encodings (the Transfer-Encoding response header) A client uses the Accept-Encoding for specifying the content encodings that it can handle and the ones it prefers The client uses the TE header similarly for transfer encodings 2007 cs236607

Accept-Language: en, fr;q=0.5, da;q=0.1 Content Negotiations Server-driven content negotiation The client sends its preferences using the headers Accept-Language, Accept-Charset, etc. The server chooses the representation that best matches the client’s preferences The headers controlling content negotiations may include wildcards and quality values (qvalues) between 0.0 and 1.0 Accept-Language: en, fr;q=0.5, da;q=0.1 2007 cs236607

Agent-Driven Content Negotiation When the client request a varying resource, the server replies with a 300 (Multiple Choices) response and it lists The available representations and their properties (e.g., language, charset, etc.) The Alternate header has been reserved for this purpose, but its specification has not been completed Hence, server-driven negotiation is the only usable form 2007 cs236607

The Vary Headerb Content negotiation and caching can interact in subtle ways Hence, the Vary header (that was mentioned earlier) 2007 cs236607

Warnings (New in HTTP/1.1) The Warning header has codes indicating some potential problems with the response, even if the status code is 200 (OK) For example, when returning a stale response because it could not be validated Warnings are divided into two types based on the first digit (out of three) digit Warning of one type should be deleted after a successful revalidation and those of the second type should be retained Hence, this mechanism is extensible to future warning codes 2007 cs236607

New Status Codes in HTTP/1.1 100 (Continue) 206 (Partial Content) 300 (Multiple Choices) 409 (Conflict) is used when a request conflicts with the current state of the resource (e.g., a PUT request might violate a versioning policy) 410 (Gone) is used when a resource has been removed permanently It indicates that links to the resource should be deleted 2007 cs236607

Links Request for Comments 2616 (rfc2616) A caching tutorial at http://www.mnot.net/cache_docs/ 2007 cs236607