Java Technology and Applications 240-527 CoE Masters Programme, PSU Semester 2, 2003-2004 7. HTTP Objectives to explain the Hypertext Transfer Protocol (HTTP)
Overview 1. How a Browser Works 2. HTTP Transactions 3. Client Request Methods 4. HTTP Protocol Versions 5. Server Response Codes 6. Some Advanced Features 7. More Information
1. How a Browser Works Browsers use the HTTP protocol to communicate with Web servers HTTP is a request/response protocol request network response Client browser Web server
1.1. Details of a Client Request From a browser, I request: http://fivedots.coe.psu.ac.th/~ad/ The browser connects to the site fivedots.coe.psu.ac.th at port 80, and sends the request: continued
various header information; one per line HTTP method/ command URL HTTP version used by client GET /~ad/ HTTP/1.1 Host: fivedots.coe.psu.ac.th User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive various header information; one per line
Details of a Server Response HTTP version used by server Details of a Server Response status code and text HTTP/1.1 200 OK Date: Sun, 12 Oct 2003 04:20:51 GMT Server: Apache/1.3.9 (Unix) Debian/GNU PHP/4.0.3pl1 X-Powered-By: PHP/4.0.3pl1 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 <html> <head> <title>Andrew Davison's Home Page at PSU</title> </head> <body bgcolor=#ffffff test=#000000> : // rest of HTML text for page HTML for Page
Part of my Home Page
1.2. Web Page Images My home page contains several images. The browser sees them in the text of the Web page: e.g. <img src="me.jpg" align="right" alt="[PIC of Andrew]"> The browser automatically requests each one.
An Image Request the page where the link to the image is located GET /~ad/me.jpg HTTP/1.1 Referer: http://fivedots.coe.psu.ac.th/~ad/ Host: fivedots.coe.psu.ac.th User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive
The Image Response HTTP/1.1 200 OK Date: Sun, 12 Oct 2003 04:20:55 GMT Server: Apache/1.3.9 (Unix) Debian/GNU PHP/4.0.3pl1 Last-Modified: Tue, 17 Oct 2000 09:40:05 GMT ETag: "1bf29-1194-39ec1e75" Accept-Ranges: bytes Content-Length: 4500 Keep-Alive: timeout=15, max=99 Connection: Keep-Alive Content-Type: image/jpeg; charset=iso-8859-1 // ... data of the JPEG file
1.3. Clicking on a Link In the browser, if I click on the link labelled 'AIT', then the browser examines the associated HTML: <a href="http://www.cs.ait.ac.th/">AIT</a> The browser then connects to www.cs.ait.ac.th at port 80, and requests the top page: continued
sent to www.cs.ait.ac.th GET / HTTP/1.1 Referer: http://fivedots.coe.psu.ac.th/~ad/ Host: www.cs.ait.ac.th User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive
Server Response This server uses HTTP 1.0 HTTP/1.0 200 OK Date: Sun, 12 Oct 2003 06:08:24 GMT Server: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2 Last-Modified: Fri, 11 Apr 2003 02:48:54 GMT ETag: "214d69-543b-3ad3c616" Accept-Ranges: bytes Content-Length: 21563 Content-Type: text/html Age: 120 X-Cache: MISS from cache3.psu.ac.th Connection: keep-alive <HTML> <HEAD> // ... rest of Web page text
The New Page
1.4. Getting a Page with Telnet In CoE/PSU, the request needs to be 'local'. ad@calvin$ telnet fivedots.coe.psu.ac.th 80 Trying 172.30.0.5... Connected to fivedots.coe.psu.ac.th. Escape character is '^]'. GET ~ad/index.html HTTP/1.0 HTTP/1.0 200 OK Date: Wed, 22 Oct 2003 05:07:26 GMT Server: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2 Last-Modified: Wed, 11 Jun 2003 02:48:54 GMT ETag: "214d69-543b-3ad3c616" Accept-Ranges: bytes // ... rest of headers and HTML text of page two newlines required response
1.5. HTTP and Web Forms
The Form HTML Code <form method="post" action= "http://fivedots.coe.psu.ac.th/cgi-bin/ad/echoer"> <input TYPE="text" NAME="pat1" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat2" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat3" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat4" SIZE="15" MAXLENGTH="15" VALUE=""> <input TYPE="text" NAME="pat5" SIZE="15" MAXLENGTH="15" VALUE=""></p> <br> <p><input TYPE="submit" VALUE="Submit"> <input TYPE="reset" VALUE="Clear"> </form>
Form Input and Output
Form Input Request The HTTP Post method POST /cgi-bin/ad/echoer HTTP/1.1 Referer: http://fivedots.coe.psu.ac.th/~ad/echoer/ eform.html Host: fivedots.coe.psu.ac.th User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive Content-type: application/x-www-form-urlencoded Content-Length: 39 pat1=hello&pat2=&pat3=world&pat4=&pat5=
Server Response HTTP/1.1 200 OK Date: Sun, 12 Oct 2003 08:30:07 GMT Server: Apache/1.3.9 Debian/GNU PHP/4.0.3pl1 Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Content-Type: text/html; charset=iso-8859-1 <html><head><title>Query Result</title></head> <body background="http://fivedots.coe.psu.ac.th/~ad/chalk.jpg"><H1 align=center>Query Result</H1> // ... rest of page
1.6 Proxies Most clients and servers do not communicate directly the client must send its request via a proxy the proxy acts as a firewall and/or cache At PSU, most Web requests must go through the cache.psu.ac.th proxy this is set up in the browser's preferences continued
In other applications, it may be necessary to explicitly communicate with the proxy this is done by connecting to the proxy, and sending it the full URL of the page required
Using a Proxy with Telnet Students should be able to do this. Using a Proxy with Telnet ad@fivedots$ telnet cache.psu.ac.th 8080 Trying 192.168.98.6... Connected to proxy6.psu.ac.th. Escape character is '^]'. GET http://www.student.math.uwaterloo.ca/~cs488/ HTTP/1.0 HTTP/1.0 200 OK Date: Thu, 21 Nov 2002 06:01:31 GMT Server: Apache/1.3.27 (Unix) mod_perl/1.21 Last-Modified: Wed, 20 Nov 2002 12:00:21 GMT ETag: "1b66a-2234-3ddb7955" : response
: Accept-Ranges: bytes Content-Length: 8756 Content-Type: text/html Age: 3263 X-Cache: HIT from cache.psu.ac.th Proxy-Connection: close <html> // ... rest of Web page text </html> Connection closed by foreign host. ad@fivedots$
2. HTTP Transactions request network response Client browser Method URL Version General header Request header Entity header Entity body request network response Client browser Web server Version Status Reason General header Response header Entity header Entity body
Client Request Example Method URL Version POST /cgi-bin/ad/echoer HTTP/1.1 Referer: http://fivedots... User-Agent: Mozilla/5.0 ... Accept: */* Accept-Language: en Accept-Encoding: gzip,... Keep-Alive: 300 Connection: keep-alive Content-type: application/x-www-form-urlencoded Content-Length: 39 pat1=hello&pat2=&pat3=world&pat4=&pat5= Request headers General headers Entity headers Entity body
Request Components HTTP methods: General headers GET, POST, HEAD, PUT, DELETE OPTIONS and TRACE (HTTP 1.1.) other non-standardized methods General headers optional general information such as the current date/time, or network characteristics continued
Request headers Entity headers information about the client, used by the server e.g. browser info., document formats that the client can understand Entity headers used when an entity (a Web document) is about to be sent e.g. encoding scheme, length, type, origin continued
Headers may be sent in any order. Header names are case-insensitive e.g. Content-Type == Content-type
Server Response Example Version Status Reason HTTP/1.1 200 OK Date: Tue,... Keep-Alive: timeout=15, max=100 Connection: Keep-Alive Transfer-Encoding: chunked Server: Apache... Content-Type: text/html;... <html> // ... rest of page General headers Response headers Entity headers Entity body
Server Components The general and entity headers are the same as those used in a client request. Response header gives the client information about the server configuration e.g. what HTTP methods are supported, request authorization details, or server time-out report
Some Other headers General Headers Cache-Control caching behaviour Connection should connection close after this transaction MIME-Version message encoding Pragma directives for proxies Via info about processing by gateways and proxies between the client and server continued
Request Headers Authorization to request restricted docs. Cookie send name=value info Host required address & port info If-Modified-Since get doc. if newer If-Match get doc. if matches etags If-Range get part of a doc. if changed Max-Forwards limits no. of proxies/gateways Proxy-Authorization for proxy Range only get part of a doc continued
Response Headers Accept-Ranges will accept range requests Age age of doc in seconds Proxy-Authenticate gives auth. scheme Public supported methods Retry-After try again after given time Set-Cookie sends a name=value pair Warning info used for caching WWW-Authentication gives auth scheme for access to Web pages continued
Entity Headers Allow methods allowed on URL Content-Location useful if a doc is stored in several locations Content-Range range of partial doc sent ETag entity tag for the doc Expires when content may change Last-Modified when doc last changed
3. Client Request Methods GET retrieve the specified document POST for sending (form) information HEAD get information about the document, but not the actual document PUT store the specified document on the server continued
DELETE TRACE OPTIONS delete the specified document on the server asks that proxies/gateways add information to the headers of the request, which is sent back in the response OPTIONS ask the server to send info about the HTTP methods it supports
3.1. The GET Method The main purpose of GET is to request a document from a server see earlier examples in section 1 But the response can be generated in various ways: a file on the Web server the output of a CGI script the script may examine server-side hardware, files, or do some special calculations
CGI Diagram the Web/Internet request request becomes input response CGI script Client browser Web server output becomes response
A CGI Request Data for a CGI script is passed as extra name=value arguments added to the URL: GET /cgi-bin/create.pl?user=util-tester& pass=1234 HTTP/1.0 Referer: ... User-Agent: ... : The arguments are URL-encoded. two arguments
URL Encoding This is added to the end of the URL after a ? name=value pairs are combined into a single string separated by &'s. This is added to the end of the URL after a ? Certain special characters are converted to hexadecimal preceded by a %. e.g. '#' becomes %23, '/' becomes %2F
3.2. The POST Method The main purpose of the POST method is to send form information to a server see the example in section 1.5 Most servers use CGI programs to process form requests. The text in the form name=value data is URL encoded.
Forms can use GET The <form> tag in HTML can also be used to send data in the GET format: <form method="get" action="http://fivedots.coe.psu.ac.th/ cgi-bin/create.pl"> <input name="user"> <input name="pass" type="password"> <input type="submit" value="Submit"> </form>
Which Method to Use? The GET method adds form input to the end of the URL, and there is often a maximum length limit e.g. the URL string must be 255 chars or less For large input, the POST method is better since there is no limit on the size of the entity body in the request.
3.3. The HEAD Method The HEAD method returns information about a document: this includes its modification time, its size, its type, and details about its server this information is useful in guiding/speeding up search engines and browsers
HEAD using Telnet response ad@calvin$ telnet fivedots.coe.psu.ac.th 80 Connected to fivedots.coe.psu.ac.th. HEAD /~ad/index.html HTTP/1.0 HTTP/1.0 200 OK Date: Sun, 12 Oct 2003 06:42:48 GMT Server: Apache/1.3.12 Ben-SSL/1.41 PHP/4.0.1pl2 Last-Modified: Tue, 29 Jul 2003 11:11:51 GMT ETag: "1f1f6e-522-3982bbf7" Accept-Ranges: bytes Content-Length: 1314 Content-Type: text/html Age: 157 Connection: close Connection closed by foreign host. ad@calvin$ response
3.4. The PUT Method The PUT method is used for uploading files to a server PUT URL HTTP-version used in HTML editors such as FrontPage Usually involves an authorization phase when the server asks for a user name and password before accepting the PUT this is processed by FrontPage using details entered by the user
3.5. The DELETE Method The DELETE method deletes the specified file: DELETE URL HTTP-version The server will usually ask for authorization information before carrying out the request.
3.6. The TRACE Method The TRACE method allows a programmer to see how the client's request is passed through proxies/gateways to the server TRACE URL HTTP-version The server echoes the request back together with a Via header (and other optional headers).
TRACE using Telnet response ad@calvin$ telnet cache.psu.ac.th 8080 Trying 192.16898.6... Connected to proxy6.psu.ac.th. Escape character is '^]'. TRACE http://www.cs.ait.ac.th HTTP/1.0 HTTP/1.0 200 OK Date: Wec, 22 Oct 2003 07:11:20 GMT Server: Stronghold/2.4.2 Apache/1.3.6 C2NetEU/2412 (Unix) Content-Type: message/http Age: 118 X-Cache: MISS from cache.psu.ac.th Proxy-Connection: close TRACE / HTTP/1.0 : response
: Cache-Control: max-age=259200 Connection: keep-alive Host: www. cs : Cache-Control: max-age=259200 Connection: keep-alive Host: www.cs.ait.ac.th Via: 1.0 cache.psu.ac.th:8080 (Squid/2.5.STABLE1) X-Forwarded-For: unknown Connection closed by foreign host. ad@calvin$
3.7. The OPTIONS Method The OPTIONS method allows a client to obtain information about what methods a server supports OPTIONS * HTTP-version Often OPTIONS is disabled. Many servers require the Host header as well.
OPTIONS using Telnet Or use HTTP /1.0 with no extras response ad@calvin$ telnet fivedots.coe.psu.ac.th 80 Trying 172.30.0.5... Connected to fivedots.coe.psu.ac.th. Escape character is '^]'. OPTIONS * HTTP/1.1 Host: fivedots.coe.psu.ac.th Connection: close HTTP/1.1 200 OK Date: Sun, 12 Oct 2003 07:37:44 GMT Server: Apache/1.3.9 Debian/GNU PHP/4.0.3pl1 Content-Length: 0 Allow: GET, HEAD, OPTIONS, TRACE Connection: close Connection closed by foreign host. ad@calvin$ Or use HTTP /1.0 with no extras response
4. HTTP Protocol Versions only supported the GET method requests and responses had no extra header information a GET of a non-existent page caused the server to return nothing no media types: only text/HTML was supported
HTTP 1.0 introduced headers, media types, more methods, caching, authentication, persistent connections headers mean that "meta" information can be transferred between clients and servers media types supported with Accept (Request)and Content-Type (Entity) headers continued
caching supported with the Last-Modified (Entity) and If-Modified-Since (Request) headers authentication supported with the Authorization (Request) and WWW-Authenticate (Response) headers persistent connections supported with the (non-standard) Connection header, with a keep-alive value
HTTP 1.1 introduced a better implementation of persistent connections, multihoming, entity tags, byte ranges, digest authentication persistent connection is the default in HTTP 1.1 only need Connection: close at the end multihoming means that a server can respond to different hostnames. HTTP 1.1. requires the Host header in all requests continued
byte ranges make it possible to retrieve only part of a document entity tags (etags) aid caching by representing each document (entity) with a unique identifier gets round the problem of the same document at different sites etags are used in the If-match and If-none-match request headers byte ranges make it possible to retrieve only part of a document useful for downloading after an interrupt, and for streaming media supported with the Range request header continued
digest authorization allows username and password information to be transferred as a unique number (a checksum) makes it much harder for hackers to steal password details
5. Server Response Codes The server response code is the number after the HTTP version string in the server response: HTTP/1.1 200 OK Date: .... : The text after the number ("ok") is a description of the code.
Response Code ranges Code Range Meaning 100-199 Information 200-299 Client request successful 300-399 Client request redirected; more action needed 400-499 Client request incomplete 500-599 Server error
Some Common Codes Code Meaning 200 OK response contains data 301 Moved new location given in Location response header 305 Use Proxy proxy location in Location 401 Unauthorized client lacked proper authorization to get the page; details sent in the WWW-Authenticate response header continued
404 Not Found no page at the URL 407 Proxy Authentication Required the client must obtain proxy authorization; details sent in the Proxy-Authenticate response header 503 Service Unavailable further details may be given in the Retry-After response header
6. Some Advanced Features Details on: media types client-side caching retrieving parts of a document authorization cookies
6.1. Media Types The client tells the server which media types it can handle using the Accept request header. The server tries to return information in a preferred media type, and gives the type in the Content-Type entity header.
Typical Client Accept Headers Newer browsers: Accept: image/gif, image/jpeg, */* Older browsers: Accept: image/gif Accept: image/jpeg Accept */*
6.2. Client-side Caching Two approaches: caching based on the document age caching based on the document's entity tag (etag) Caching can be configured using the general header Cache-Control it can be switched off or set to a certain amount of time e.g. Cache-Control: no-cache continued
Cache-Control replaces the Pragma heading of HTTP 1 Cache-Control replaces the Pragma heading of HTTP 1.0 which could only switch off caching: Pragma: no-cache
Caching using Age The request header If-Modified-Since: If-Modified-Since: Fri, 15-Jun-01 01:00:00 GMT The server returns response code 304 if it has not been modified, and the client can use the cached version. Otherwise it returns 200 and the page. continued
There is a If-Unmodified-Since header. The server can return an Expires header which states when the document may change.
Caching using Etags If the server is using etags, it will return an ETag header with the document. The client can check documents in its cache by using the If-Match or If-None-Match headers with etags.
6.3. Retrieving Parts of a Doc. In HTTP 1.1, a client does not need to get all of a document at once it can retrieve it in pieces, specified using byte ranges For this to be possible, the server must send a response containing the Accept-Ranges header: Accept-Ranges: bytes continued
Then the client can request the data in pieces: GET /largefile.html HTTP/1.1 // other headers Range: 0-65535 Response includes a Content-range header: HTTP/1.1 200 OK // other headers Content-range: 0-65535/83028576 // data continued
The client can include an If-Range header to use a cached part unless it has been updated: GET /largefile.html HTTP/1.1 // other headers If-Range: Fri, 15-Jun-01 01:00:00 GMT Range: 0-65535
6.4. Authorization 1) Ordinary request 2) Denies access and sends WWW-Authenticate header 3. Username and password obtained 4) Send request again but with Authorization header 5) Response continued
The WWW-Authenticate header specifies the authorization method required by the server: usually BASIC which requires a "username:password" string encode in base64 BASIC also includes a realm, which is a class of users
1) Initial Request GET /sample.html HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Keep-Alive: 300 Connection: keep-alive
2) Access Denied HTTP/1.0 401 Unauthorized Server: Squid/2.2.STABLE5 Mime-Version: 1.0 Date: Sun, 12 Oct 2003 08:59:09 GMT Content-Type: text/html WWW-Authenticate: Basic realm= "Systems Administrator"
3) The Browser Dialog
4) Send Request Again GET /sample.html HTTP/1.1 User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; m18) Gecko/20010131 Netscape6/6.01 Accept: */* Accept-Language: en Accept-Encoding: gzip,deflate,compress,identity Authorization: BASIC jhg235gjmg5jkjkgj24g42g
5) Response HTTP/1.0 200 OK Server: Squid/2.2.STABLE5 Mime-Version: 1.0 Date: Sun, 12 Oct 2003 09:01:13 GMT Content-Type: text/html Conetnt-length 1029 // HTML of sample.html page
6.5 Cookies Client-side cookies are used to store client-specific information on the client's machine used by the browser when it accesses the same page again Not part of the HTTP specification, but used in every browser.
Cookie Usage 1) Ordinary request 2) Response and a Set-Cookie header 3) The browser stores the cookie 4) Later send another request with Cookie header included 5) The server uses the cookie information. 6) Customised response and an updated Set-Cookie header
1) & 2) Request and Response POST /www.whosis.com/order.pl HTTP/1.0 // client headers type=newCust&firstname=Andrew HTTP/1.0 200 OK // server headers Set-Cookie: acct=02746284
3) & 4) Storage and Later Use The browser stores the cookie information: www.whosis.com/order.pl acct=02746284 Days/months later, another request: POST /www.whosis.com/order.pl HTTP/1.0 // client headers here Cookie: acct=02746284 type=oldCust
7. More Information The World Wide Web Consortium: http://www.w3.org HTTP/1.1 Specification: http://www.w3.org/Protocols/HTTP/ rfc2616/rfc2616.html