Download presentation
Presentation is loading. Please wait.
1
FIT5170 Week 6 HTTP and Java
2
Introduction The World Wide Web is a major distributed system, with millions of users. A site may become a web host by running an http server. Web clients are typically users running a web browser. There are many other ‘user agents’, such as web spiders and web application clients. The World Wide Web (abbreviated as WWW or W3, commonly known as the Web), is a system of interlinked hypertext documents accessed via the Internet. With a web browser, one can view web pages that may contain text, images, videos, and other multimedia, and navigate between them via hyperlinks. British engineer, at CERN, Sir Tim Berners-Lee, now Director of the World Wide Web Consortium (W3C), wrote a proposal in March 1989 for what would eventually become the World Wide Web. At CERN, a European research organisation near Geneva, Berners-Lee and Belgian computer scientist Robert Cailliau proposed in 1990 to use hypertext "to link and access information of various kinds as a web of nodes in which the user can browse at will", and they publicly introduced the project in December of the same year. A spider is a program that visits Web sites and reads their pages and other information in order to create entries for a search engine index. The major search engines on the Web all have such a program, which is also known as a "crawler" or a "bot."
3
Servers There are a number of servers available.
They all use the same protocol for communication with clients, and they differ in capabilities such as speed, reliability, etc. Original ones were the CERN server and the NCSA server. These have given way to servers from Apache, Microsoft, NGINX, Google etc. (50+ web server software exists nowadays) NCSA may refer to: National Center for Supercomputing Applications NCSA HTTPd, an early web server developed at the CERN center. CERN (European Organization for Nuclear Research) In 1989 Tim Berners-Lee proposed to his employer CERN (European Organization for Nuclear Research) a new project, which had the goal of easing the exchange of information between scientists by using a hypertext system. As a result of the implementation of this project, in 1990 Berners-Lee wrote two programs: a browser called WorldWideWeb; NGINX (pronounced engine-x) is a free, open-source, high-performance HTTP server and reverse proxy, as well as an IMAP/POP3 proxy server. Igor Sysoev started development of Nginx in 2002, with the first public release in Nginx now hosts nearly 12.18% (22.2M) of active sites across all domains. Nginx is known for its high performance, stability, rich feature set, simple configuration, and low resource consumption. the world's first Web server, which ran on NeXTSTEP; NOTE: today, this machine is on exhibition at CERN's public museum, Microcosm. The first web server in U.S.A. was installed on December 12, 1991 by Bebo White at SLAC [1] after returning from a sabbatical at CERN. Between 1991 and 1994 the simplicity and effectiveness of early technologies used to surf and exchange data through the WorldWideWeb helped a lot to: port them to many different OSs; spread their use among lots of different social groups of people, first in scientific organizations, then in universities and finally in industry. In 1994 Tim Berners-Lee decided to constitute the World Wide Web Consortium to regulate the further development of the many technologies involved (HTTP, HTML, etc.) through a standardization process. The following years are recent history which has seen an exponential growth (become explosive after 2000) of the number of web sites and, of course, of the number of Web Servers. Sir Tim Berners-Lee
4
Servers The primary purpose of a Web server is to deliver a document on request to a client. The document may be text, an image file, or other type of file. The document is identified by a name called a URL (Uniform Resource Locator). If the server stores that particular URL (or can generate content for that URL), then it returns the document as the message reply Although Web server programs differ in detail, they all share some basic common features. HTTP: every Web server program operates by accepting HTTP requests from the network, and providing an HTTP response to the requester. The HTTP response typically consists of an HTML document, but can also be a raw text file, an image, or some other type of document (defined by MIME-types); if something bad is found in client request or while trying to serve the request, a Web server has to send an error response which may include some custom HTML or text messages to better explain the problem to end users. a uniform resource identifier (URI) is a string of characters used to identify a name of a resource. The most common form of URI is the uniform resource locator (URL), frequently referred to informally as a web address. More rarely seen in usage is the uniform resource name (URN), which was designed to complement URLs by providing a mechanism for the identification of resources in particular namespaces.
5
Browsers The purpose of a browser is to allow the user to request documents to be delivered to it, and to display them in some meaningful way. Browsers differ in the version of HTML they support, in extra features such as non-standard extensions, support, the amount of customisation, speed, caching capabilities, etc. Browsers include IE, Safari, Chrome, Netscape, Firefox, Opera, Lynx, Chimera, Market Share for July, 2007 Internet Explorer % Firefox % Safari % Opera % Netscape % Opera Mini % Other % software application that enables a user to display and interact with text, images, and other information typically located on a Web page at a website on the World Wide Web or a local area network. Text and images on a Web page can contain hyperlinks to other Web pages at the same or different website. Web browsers allow a user to quickly and easily access information provided on many Web pages at many websites by traversing these links. Web browsers format HTML information for display, so the appearance of a Web page may differ between browsers. Web browsers communicate with Web servers primarily using HTTP (hypertext transfer protocol) to fetch webpages. HTTP allows Web browsers to submit information to Web servers as well as fetch Web pages from them. The most commonly used HTTP is HTTP/1.1, which is fully defined in RFC HTTP/1.1 has its own required standards that Internet Explorer does not fully support, but most other current-generation Web browsers do. Pages are located by means of a URL (uniform resource locator, RFC 1738 ), which is treated as an address, beginning with http: for HTTP access. Many browsers also support a variety of other URL types and their corresponding protocols, such as gopher: for Gopher (a hierarchical hyperlinking protocol), ftp: for FTP (file transfer protocol), rtsp: for RTSP (real-time streaming protocol), and https: for HTTPS (an SSL encrypted version of HTTP).
6
URLs URLs specify a document access method (a client server protocol), a server machine and the location of a document on that machine. ftp://ftp.monash.edu.au/pub A uniform resource identifier (URI) is either a uniform resource locator (URL), or a uniform resource name (URN), or both. Uniform Resource locator (URL) URL is an acronym for Uniform Resource Locator and is a reference (an address) to a resource on the Internet. A URL has two main components: Protocol identifier: For the URL , the protocol identifier is http . Resource name: For the URL , the resource name is example.com. A uniform resource locator (URL) is a reference to a resource that specifies the location of the resource on a computer network and a mechanism for retrieving it. A URL is a specific type of uniform resource identifier (URI).[1] although many people use the two terms interchangeably.[2] A URL implies the means to access an indicated resource, which is not true of every URI.[2][3] URLs occur most commonly to reference web pages (http), but are also used for file transfer (ftp), (mailto), database access (JDBC), and many other applications. Most web browsers display the URL of a web page above the page in an address bar. A typical URL has the form which indicates the protocol type (http), the domain name, ( and the specific web page (index.html).
7
HTTP - Design HTTP is a stateless, connectionless, and reliable protocol. Each request from a client is handled reliably and then the connection is broken. The web is an excellent example of a set of protocols stretched way beyond their original scope, with a huge series of patches at all levels to try and fix problems. HTTP is connectionless protocol, although TCP is connection for transport. HTTP is a stateless(Application level) and connection oriented (Transport layer level) protocol. HTTP communication usually takes place over TCP/IP connections. The default port is TCP 80 [19], but other ports can be used. This does not preclude HTTP from being implemented on top of any other protocol on the Internet, or on other networks. HTTP only presumes a reliable transport; any protocol that provides such guarantees can be used; the mapping of the HTTP/1.1 request and response structures onto the transport data units of the protocol in question is outside the scope of this specification. In HTTP/0.9 and 1.0, the connection is closed after a single request/response pair. In HTTP/1.1 a keep-alive-mechanism was introduced, where a connection could be reused for more than one request. Such persistent connections reduce request latency perceptibly, because the client does not need to re-negotiate the TCP 3-Way-Handshake connection after the first request has been sent. Another positive side effect is that in general the connection becomes faster with time due to TCP's slow-start-mechanism. Why Http known as stateless protocol? HTTP is called a stateless protocol because each command is executed independently, without any knowledge of the commands that came before it. This is the main reason that it is difficult to implement Web sites that react intelligently to user input. This shortcoming of HTTP is being addressed in a number of new technologies, including ActiveX, Java, JavaScript and cookies. The concept of "connectionless" is not limited to only the transport layer but apply throughout the protocol stack. A protocol is connection-oriented if each party maintains communication state between multiple requests or packets. So HTTP is clearly a connectionless protocol. In general, stateless protocols are by definition connectionless. HTTP/1.1 introduced the notion of "persistent connections" where the underlying connection may be "kept-alive" between requests. Requests may also be "pipelined" (where multiple requests are sent before responses are received.) However, HTTP/1.1 makes no guarantee that the connection will remain open, does not maintain any state between request-response pairs, and therefore it is still a connectionless protocol.
8
HTTP - Versions There are 3 versions of HTTP
Version totally obsolete Version almost obsolete Version 1.1 & 1.2 – current Version Latest An O/O version was under development to replace HTTP/1.1 but seems to have vanished. Each version must understand all earlier versions Refer: HTTP/2 (originally named HTTP/2.0) is the second major version of the HTTP network protocol used by the World Wide Web. The Working Group presented HTTP/2 to IESG for consideration as a Proposed Standard in December 2014, and IESG approved it to publish as Proposed Standard on Feb 17, The primary focus of HTTP 2.0 is on improving transport performance and enabling both lower latency and higher throughput. HTTP 1.2 Released with Improved Support for Hierarchies and Text-Menu Interfaces With the new 1.2 version, HTTP gets a much stronger support for resource hierarchies and gets better support for text menu interfaces, which are well-suited to computing environments like mobile clients. As part of its design goals, HTTP 1.2 functions and appears much like a mountable read-only global network file system. A system supporting this latest version, consists of a series of hierarchical hyper-linkable menus. The choice of menu items and titles is controlled by the administrator of the server. Don’t get confused with HTTP and HTML versions, they are different! The Web The content of the messages has been through a large number of versions HTML versions (all different) (as of March 2013, 5.0 is still under development); expect in 2014 The Hypertext Transfer Protocol (HTTP) is one of the most ubiquitous and widely adopted application protocols on the Internet: it is the common language between clients and servers, enabling the modern web. From its simple beginnings as a single keyword and document path, it has become the protocol of choice not just for browsers, but for virtually every Internet-connected software and hardware application. In this chapter, we will take a brief historical tour of the evolution of the HTTP protocol. A full discussion of the varying HTTP semantics is outside the scope of this book, but an understanding of the key design changes of HTTP, and the motivations behind each, will give us the necessary background for our discussions on HTTP performance, especially in the context of the many upcoming improvements in HTTP 2.0. The primary focus of HTTP 2.0 is on improving transport performance and enabling both lower latency and higher throughput.
9
HTTP 0.9 Request Request = Simple-Request Simple-Request =
"GET" SP Request-URI CRLF Response Response = Simple-Response Simple-Response = [Entity-Body] HTTP/0.9 is the first version of HTTP. With simple request and simple-response! This version was first written to comply with requirement of Tim Berners-Lee implemented very simple requests, that is to say only the one to get a document (the GET method)! SP means “single space” Advantages of HTTP/0.9 HTTP/0.9 has some undeniable advantages: it does not rely on the transport layer (layer 4: TCP or UDP) and it can be used to carry any kind of documents. There is nothing more simple than HTTP/0.9. Restrictions HTTP/0.9 has obviously some limitations that will be partially solved by HTTP/1.0, and thenHTTP/1.1. The first drawback is that the connection between the client and the server is closed every time after the server has replied to a request. The consequences are the following ones: the client must open a connection for every document to be downloaded, especially for images. With a web page that contains 3 images, the client must open 4 connection in a row, and opening a connection is a slow process. the user cannot but wait... the network is congestioned by requests to open a connection web browser open several connection at the same time (up to 4 for Netscape); servers are then also congestioned. HTTP/0.9 is also not able to manage caches. Document transfers are not optimised at all. We can send data to a server only by using a specific GET request, and this limits the amount of data we can send; let's also notice that this data is written in the URI, not hidden or encrypted, so there are problems of confidentiality. The user is aware of errors (he can see a weird web page), but the web browser does not know there is something wrong happened. Improvements It is provided by HTTP/1.0. Principle HTTP/0.9 is the first version of HTTP. This version was first written to comply with the exactions given by Tim Berners-Lee about the transport of HTML pages at the CERN. Therefore, he implemented very simple requests, that is to say only the one to get a document (the GET method)! We cannot but ask for a document; it is impossible to send "personal" data to servers. Why HTTP/0.9? When Tim Berners-Lee invented this protocol, there was no version number. HTTP/0.9 got its number only when HTTP/1.0 was written (HTTP/1.0 is the first HTTP protocol described in a RFC) and it was decided that this new version would be called HTTP/1.0. Example of HTTP/0.9 request All HTTP/0.9 requests look like this one: GET HelloThe requested document arrives straigth after the request has been received, and then the connection is closed by the server. In HTTP/0.9, there is only the GET method. Everything is performed using this method, even sending data to the server (the requested URI looks then like this: what follows the first question mark means "the variable called var1 is set to 'foo'").
10
HTTP 1.0 This version added much more information to the requests and responses. Rather than "grow" the 0.9 format, it was just left alongside the new version. SP means “single space” The successor of HTTP/0.9 is 1.0 and improvements It first eases web surfing: it is able to work with cache systems It is also possible to send data to a server (since the new POST method). HTTP/1.0 is also able to recognize when a request did not work (the famous "404 not Found" message). Finally it allows users to authenticate Example of HTTP/1.0 request Regarding HTTP/0.9, HTTP/1.0 brings a real innovation regarding the form of a request, and especially the form of the reply: $ telnet www2.themanualpage.org 80 Trying... Connected to www2.themanualpage.org. Escape character is '^]'. GET HTTP/1.0 User-Agent: Mozilla/4.03 [fr] HTTP/ OK Date: Thu, 20 Jul :43:02 GMT Server: Apache/ (Unix) PHP/3.0.9 Last-Modified: Mon, 17 Jul :55:03 GMT Content-Type: text/plain Hello Connection closed by foreign host. We immediately notice that there is much more information in this request than in a HTTP/0.9 request. Let's first notice a "HTTP/1.0" at the end of the first line. This is used to tell the server that we would like to speak HTTP/1.0 with in for this request. It will be same with HTTP/1.1 and also in all likelihood with any other new version of HTTP. In this example the server will actually reply in HTTP/1.1. This can happen... The second (and very important) difference is that we also say what web browser we are using... We are adding data in the request! We will see later what other kind of data with what we call therequest's header. The third difference is also very important: the server says a lot of things (with directives) before saying what we are waiting for. Another header! Finally, (at last!) we get the file (the so-called entity). Just straigh after the server cuts the connection.
11
HTTP 1.0 The format of requests from client to server:-
Request = Simple-Request | Full-Request Simple-Request = "GET" SP Request-URI CRLF Full-Request = Request-Line *(General-Header | Request-Header | Entity-Header) CRLF [Entity-Body] SP means “single space” Here we have Simple Request AND also FULL-RESPONSE! The successor of HTTP/0.9 is 1.0 and improvements It first eases web surfing: it is able to work with cache systems It is also possible to send data to a server (since the new POST method). HTTP/1.0 is also able to recognize when a request did not work (the famous "404 not Found" message). Finally it allows users to authenticate Example of HTTP/1.0 request Regarding HTTP/0.9, HTTP/1.0 brings a real innovation regarding the form of a request, and especially the form of the reply: $ telnet www2.themanualpage.org 80 Trying... Connected to www2.themanualpage.org. Escape character is '^]'. GET HTTP/1.0 User-Agent: Mozilla/4.03 [fr] HTTP/ OK Date: Thu, 20 Jul :43:02 GMT Server: Apache/ (Unix) PHP/3.0.9 Last-Modified: Mon, 17 Jul :55:03 GMT Content-Type: text/plain Hello Connection closed by foreign host. We immediately notice that there is much more information in this request than in a HTTP/0.9 request. Let's first notice a "HTTP/1.0" at the end of the first line. This is used to tell the server that we would like to speak HTTP/1.0 with in for this request. It will be same with HTTP/1.1 and also in all likelihood with any other new version of HTTP. In this example the server will actually reply in HTTP/1.1. This can happen... The second (and very important) difference is that we also say what web browser we are using... We are adding data in the request! We will see later what other kind of data with what we call therequest's header. The third difference is also very important: the server says a lot of things (with directives) before saying what we are waiting for. Another header! Finally, (at last!) we get the file (the so-called entity). Just straigh after the server cuts the connection.
12
HTTP 1.0 A Simple-Request is an HTTP/0.9 request and must be replied to by a Simple-Response. A Request-Line has format Request-Line = Method SP Request-URI SP HTTP-Version CRLF where Method = "GET" | "HEAD" | POST | extension-method e.g. GET HTTP/1.0 SP means “single space” here you can see methods are more- from simple GET in ver-9 we have GET, HEAD & POST response in 1.0 The successor of HTTP/0.9 is 1.0 and improvements It first eases web surfing: it is able to work with cache systems It is also possible to send data to a server (since the new POST method). HTTP/1.0 is also able to recognize when a request did not work (the famous "404 not Found" message). Finally it allows users to authenticate Example of HTTP/1.0 request Regarding HTTP/0.9, HTTP/1.0 brings a real innovation regarding the form of a request, and especially the form of the reply: $ telnet www2.themanualpage.org 80 Trying... Connected to www2.themanualpage.org. Escape character is '^]'. GET HTTP/1.0 User-Agent: Mozilla/4.03 [fr] HTTP/ OK Date: Thu, 20 Jul :43:02 GMT Server: Apache/ (Unix) PHP/3.0.9 Last-Modified: Mon, 17 Jul :55:03 GMT Content-Type: text/plain Hello Connection closed by foreign host. We immediately notice that there is much more information in this request than in a HTTP/0.9 request. Let's first notice a "HTTP/1.0" at the end of the first line. This is used to tell the server that we would like to speak HTTP/1.0 with in for this request. It will be same with HTTP/1.1 and also in all likelihood with any other new version of HTTP. In this example the server will actually reply in HTTP/1.1. This can happen... The second (and very important) difference is that we also say what web browser we are using... We are adding data in the request! We will see later what other kind of data with what we call therequest's header. The third difference is also very important: the server says a lot of things (with directives) before saying what we are waiting for. Another header! Finally, (at last!) we get the file (the so-called entity). Just straigh after the server cuts the connection.
13
HTTP 1.0 Simple-Response = [Entity-Body]
Response = Simple-Response | Full-Response Simple-Response = [Entity-Body] Full-Response = Status-Line *(General-Header | Response-Header | Entity-Header) CRLF [Entity-Body] SP means “single space” The successor of HTTP/0.9 is 1.0 and improvements It first eases web surfing: it is able to work with cache systems It is also possible to send data to a server (since the new POST method). HTTP/1.0 is also able to recognize when a request did not work (the famous "404 not Found" message). Finally it allows users to authenticate Example of HTTP/1.0 request Regarding HTTP/0.9, HTTP/1.0 brings a real innovation regarding the form of a request, and especially the form of the reply: $ telnet www2.themanualpage.org 80 Trying... Connected to www2.themanualpage.org. Escape character is '^]'. GET HTTP/1.0 User-Agent: Mozilla/4.03 [fr] HTTP/ OK Date: Thu, 20 Jul :43:02 GMT Server: Apache/ (Unix) PHP/3.0.9 Last-Modified: Mon, 17 Jul :55:03 GMT Content-Type: text/plain Hello Connection closed by foreign host. We immediately notice that there is much more information in this request than in a HTTP/0.9 request. Let's first notice a "HTTP/1.0" at the end of the first line. This is used to tell the server that we would like to speak HTTP/1.0 with in for this request. It will be same with HTTP/1.1 and also in all likelihood with any other new version of HTTP. In this example the server will actually reply in HTTP/1.1. This can happen... The second (and very important) difference is that we also say what web browser we are using... We are adding data in the request! We will see later what other kind of data with what we call therequest's header. The third difference is also very important: the server says a lot of things (with directives) before saying what we are waiting for. Another header! Finally, (at last!) we get the file (the so-called entity). Just straigh after the server cuts the connection.
14
HTTP 1.0 The Status-Line gives information about the fate of the request: Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF e.g. HTTP/ OK SP single space CRLF(Carriage Return and Line Feed) CRLF Injection Mechanism CRLF (Carriage Return and Line Feed) is a very significant sequence of characters for programmers. These two special characters represent the End Of Line (EOL) marker for many Internet protocols, including, but not limited to MIME ( ), NNTP (newsgroups) and more importantly HTTP. When programmers write code for web applications they split headers based on where the CRLF is found. If a malicious user is able to inject his own CRLF sequence into an HTTP stream, he is able to maliciously control the way a web application functions.
15
HTTP 1.0 Status-Code = "200" ; OK | "201" ; Created | "202" ; Accepted
| "204" ; No Content | "301" ; Moved permanently | "302" ; Moved temporarily | "304" ; Not modified | "400" ; Bad request | "401" ; Unauthorised | "403" ; Forbidden | "404" ; Not found | "500" ; Internal server error | "501" ; Not implemented | "502" ; Bad gateway | "503” ; Service unavailable | extension-code 1xx Informational 2xx Success 3xx Redirection 4xx Client Error 5xx Server Error 1xx Informational: Request received, continuing process. 2xx Success: This class of status codes indicates the action requested by the client was received, understood, accepted and processed successfully. 3xx Redirection: The client must take additional action to complete the request. 4xx Client Error: 5xx Server Error: The server failed to fulfill an apparently valid request. 1xx Informational: Request received, continuing process.[2] This class of status code indicates a provisional response, consisting only of the Status-Line and optional headers, and is terminated by an empty line. Since HTTP/1.0 did not define any 1xx status codes, servers must not send a 1xx response to an HTTP/1.0 client except under experimental conditions 100 Continue This means that the server has received the request headers, and that the client should proceed to send the request body (in the case of a request for which a body needs to be sent; for example, a POST request). If the request body is large, sending it to a server when a request has already been rejected based upon inappropriate headers is inefficient. To have a server check if the request could be accepted based on the request's headers alone, a client must send Expect: 100-continue as a header in its initial request[2] and check if a 100 Continue status code is received in response before continuing (or receive 417 Expectation Failed and not continue).[2] 101 Switching Protocols This means the requester has asked the server to switch protocols and the server is acknowledging that it will do so.[2] 102 Processing (WebDAV; RFC 2518) As a WebDAV request may contain many sub-requests involving file operations, it may take a long time to complete the request. This code indicates that the server has received and is processing the request, but no response is available yet.[3] This prevents the client from timing out and assuming the request was lost. 2xx Success This class of status codes indicates the action requested by the client was received, understood, accepted and processed successfully. 200 OK Standard response for successful HTTP requests. The actual response will depend on the request method used. In a GET request, the response will contain an entity corresponding to the requested resource. In a POST request the response will contain an entity describing or containing the result of the action.[2] 201 Created The request has been fulfilled and resulted in a new resource being created.[2] 202 Accepted The request has been accepted for processing, but the processing has not been completed. The request might or might not eventually be acted upon, as it might be disallowed when processing actually takes place.[2] 203 Non-Authoritative Information (since HTTP/1.1) The server successfully processed the request, but is returning information that may be from another source.[2] 204 No Content The server successfully processed the request, but is not returning any content.[2] 205 Reset Content The server successfully processed the request, but is not returning any content. Unlike a 204 response, this response requires that the requester reset the document view.[2] 206 Partial Content The server is delivering only part of the resource due to a range header sent by the client. The range header is used by tools like wget to enable resuming of interrupted downloads, or split a download into multiple simultaneous streams.[2] 207 Multi-Status (WebDAV; RFC 4918) The message body that follows is an XML message and can contain a number of separate response codes, depending on how many sub-requests were made.[4] 208 Already Reported (WebDAV; RFC 5842) The members of a DAV binding have already been enumerated in a previous reply to this request, and are not being included again. 250 Low on Storage Space (RTSP; RFC 2326) The server returns this warning after receiving a RECORD request that it may not be able to fulfill completely due to insufficient storage space. If possible, the server should use the Range header to indicate what time period it may still be able to record. Since other processes on the server may be consuming storage space simultaneously, a client should take this only as an estimate.[5] 226 IM Used (RFC 3229) The server has fulfilled a GET request for the resource, and the response is a representation of the result of one or more instance-manipulations applied to the current instance.[6] 3xx Redirection: The client must take additional action to complete the request.[2] This class of status code indicates that further action needs to be taken by the user agent to fulfill the request. The action required may be carried out by the user agent without interaction with the user if and only if the method used in the second request is GET or HEAD. A user agent should not automatically redirect a request more than five times, since such redirections usually indicate an infinite loop. 300 Multiple Choices Indicates multiple options for the resource that the client may follow. It, for instance, could be used to present different format options for video, list files with different extensions, or word sense disambiguation.[2] 301 Moved Permanently This and all future requests should be directed to the given URI.[2] 302 Found This is an example of industry practice contradicting the standard.[2] The HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was "Moved Temporarily"),[7] but popular browsers implemented 302 with the functionality of a 303 See Other. Therefore, HTTP/1.1 added status codes 303 and 307 to distinguish between the two behaviours.[8] However, some Web applications and frameworks use the 302 status code as if it were the 303.[9] 303 See Other (since HTTP/1.1) The response to the request can be found under another URI using a GET method. When received in response to a POST (or PUT/DELETE), it should be assumed that the server has received the data and the redirect should be issued with a separate GET message.[2] 304 Not Modified Indicates that the resource has not been modified since the version specified by the request headers If-Modified-Since or If-Match.[2] This means that there is no need to retransmit the resource, since the client still has a previously-downloaded copy. 305 Use Proxy (since HTTP/1.1) The requested resource is only available through a proxy, whose address is provided in the response.[2] Many HTTP clients (such as Mozilla[10] and Internet Explorer) do not correctly handle responses with this status code, primarily for security reasons.[citation needed] 306 Switch Proxy No longer used.[2] Originally meant "Subsequent requests should use the specified proxy."[11] 307 Temporary Redirect (since HTTP/1.1) In this case, the request should be repeated with another URI; however, future requests should still use the original URI.[2] In contrast to how 302 was historically implemented, the request method is not allowed to be changed when reissuing the original request. For instance, a POST request should be repeated using another POST request.[12] 308 Permanent Redirect (approved as experimental RFC)[13] The request, and all future requests should be repeated using another URI. 307 and 308 (as proposed) parallel the behaviours of 302 and 301, but do not allow the HTTP method to change. So, for example, submitting a form to a permanently redirected resource may continue smoothly. 4xx Client Error The 4xx class of status code is intended for cases in which the client seems to have erred. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and whether it is a temporary or permanent condition. These status codes are applicable to any request method. User agents should display any included entity to the user. 400 Bad Request The request cannot be fulfilled due to bad syntax.[2] 401 Unauthorized Similar to 403 Forbidden, but specifically for use when authentication is required and has failed or has not yet been provided.[2] The response must include a WWW-Authenticate header field containing a challenge applicable to the requested resource. See Basic access authentication and Digest access authentication. 402 Payment Required Reserved for future use.[2] The original intention was that this code might be used as part of some form of digital cash or micropayment scheme, but that has not happened, and this code is not usually used. As an example of its use, however, Apple's defunct MobileMe service generated a 402 error if the MobileMe account was delinquent.[citation needed] In addition, YouTube uses this status if a particular IP address has made excessive requests, and requires the person to enter a CAPTCHA. 403 Forbidden The request was a valid request, but the server is refusing to respond to it.[2] Unlike a 401 Unauthorized response, authenticating will make no difference.[2] On servers where authentication is required, this commonly means that the provided credentials were successfully authenticated but that the credentials still do not grant the client permission to access the resource (e.g. a recognized user attempting to access restricted content). 404 Not Found The requested resource could not be found but may be available again in the future.[2] Subsequent requests by the client are permissible. 405 Method Not Allowed A request was made of a resource using a request method not supported by that resource;[2] for example, using GET on a form which requires data to be presented via POST, or using PUT on a read-only resource. 406 Not Acceptable The requested resource is only capable of generating content not acceptable according to the Accept headers sent in the request.[2] 407 Proxy Authentication Required The client must first authenticate itself with the proxy.[2] 408 Request Timeout The server timed out waiting for the request.[2] According to W3 HTTP specifications: "The client did not produce a request within the time that the server was prepared to wait. The client MAY repeat the request without modifications at any later time." 409 Conflict Indicates that the request could not be processed because of conflict in the request, such as an edit conflict.[2] 410 Gone Indicates that the resource requested is no longer available and will not be available again.[2] This should be used when a resource has been intentionally removed and the resource should be purged. Upon receiving a 410 status code, the client should not request the resource again in the future. Clients such as search engines should remove the resource from their indices. Most use cases do not require clients and search engines to purge the resource, and a "404 Not Found" may be used instead. 411 Length Required The request did not specify the length of its content, which is required by the requested resource.[2] 412 Precondition Failed The server does not meet one of the preconditions that the requester put on the request.[2] 413 Request Entity Too Large The request is larger than the server is willing or able to process.[2] 414 Request-URI Too Long The URI provided was too long for the server to process.[2] 415 Unsupported Media Type The request entity has a media type which the server or resource does not support.[2] For example, the client uploads an image as image/svg+xml, but the server requires that images use a different format. 416 Requested Range Not Satisfiable The client has asked for a portion of the file, but the server cannot supply that portion.[2] For example, if the client asked for a part of the file that lies beyond the end of the file.[2] 417 Expectation Failed The server cannot meet the requirements of the Expect request-header field.[2] 418 I'm a teapot (RFC 2324) This code was defined in 1998 as one of the traditional IETF April Fools' jokes, in RFC 2324, Hyper Text Coffee Pot Control Protocol, and is not expected to be implemented by actual HTTP servers. 420 Enhance Your Calm (Twitter) Not part of the HTTP standard, but returned by the Twitter Search and Trends API when the client is being rate limited.[14] Other services may wish to implement the 429 Too Many Requests response code instead. 422 Unprocessable Entity (WebDAV; RFC 4918) The request was well-formed but was unable to be followed due to semantic errors.[4] 423 Locked (WebDAV; RFC 4918) The resource that is being accessed is locked.[4] 424 Failed Dependency (WebDAV; RFC 4918) The request failed due to failure of a previous request (e.g. a PROPPATCH).[4] 424 Method Failure (WebDAV)[15] Indicates the method was not executed on a particular resource within its scope because some part of the method's execution failed causing the entire method to be aborted. 425 Unordered Collection (Internet draft) Defined in drafts of "WebDAV Advanced Collections Protocol",[16] but not present in "Web Distributed Authoring and Versioning (WebDAV) Ordered Collections Protocol".[17] 426 Upgrade Required (RFC 2817) The client should switch to a different protocol such as TLS/1.0.[18] 428 Precondition Required (RFC 6585) The origin server requires the request to be conditional. Intended to prevent "the 'lost update' problem, where a client GETs a resource's state, modifies it, and PUTs it back to the server, when meanwhile a third party has modified the state on the server, leading to a conflict."[19] 429 Too Many Requests (RFC 6585) The user has sent too many requests in a given amount of time. Intended for use with rate limiting schemes.[19] 431 Request Header Fields Too Large (RFC 6585) The server is unwilling to process the request because either an individual header field, or all the header fields collectively, are too large.[19] 444 No Response (Nginx) Used in Nginx logs to indicate that the server has returned no information to the client and closed the connection (useful as a deterrent for malware). 449 Retry With (Microsoft) A Microsoft extension. The request should be retried after performing the appropriate action.[20] Often search-engines or custom applications will ignore required parameters. Where no default action is appropriate, the Aviongoo website sends a "HTTP/ Retry with valid parameters: param1, param2, . . ." response. The applications may choose to learn, or not. 450 Blocked by Windows Parental Controls (Microsoft) A Microsoft extension. This error is given when Windows Parental Controls are turned on and are blocking access to the given webpage.[21] 451 Parameter Not Understood (RTSP) The recipient of the request does not support one or more parameters contained in the request.[5] 451 Unavailable For Legal Reasons (Internet draft) Defined in the internet draft "A New HTTP Status Code for Legally-restricted Resources".[22] Intended to be used when resource access is denied for legal reasons, e.g. censorship or government-mandated blocked access. A reference to the 1953 dystopian novel Fahrenheit 451, where books are outlawed.[23] 451 Redirect (Microsoft) Used in Exchange ActiveSync if there either is a more efficient server to use or the server can't access the users' mailbox.[24] The client is supposed to re-run the HTTP Autodiscovery protocol to find a better suited server.[25] 452 Conference Not Found (RTSP) The conference indicated by a Conference header field is unknown to the media server.[5] 453 Not Enough Bandwidth (RTSP) The request was refused because there was insufficient bandwidth. This may, for example, be the result of a resource reservation failure.[5] 454 Session Not Found (RTSP) The RTSP session identifier in the Session header is missing, invalid, or has timed out.[5] 455 Method Not Valid in This State (RTSP) The client or server cannot process this request in its current state. The response SHOULD contain an Allow header to make error recovery easier.[5] 456 Header Field Not Valid for Resource (RTSP) The server could not act on a required request header. For example, if PLAY contains the Range header field but the stream does not allow seeking.[5] 457 Invalid Range (RTSP) The Range value given is out of bounds, e.g., beyond the end of the presentation.[5] 458 Parameter Is Read-Only (RTSP) The parameter to be set by SET_PARAMETER can be read but not modified.[5] 459 Aggregate Operation Not Allowed (RTSP) The requested method may not be applied on the URL in question since it is an aggregate (presentation) URL. The method may be applied on a stream URL.[5] 460 Only Aggregate Operation Allowed (RTSP) The requested method may not be applied on the URL in question since it is not an aggregate (presentation) URL. The method may be applied on the presentation URL.[5] 461 Unsupported Transport (RTSP) The Transport field did not contain a supported transport specification.[5] 462 Destination Unreachable (RTSP) The data transmission channel could not be established because the client address could not be reached. This error will most likely be the result of a client attempt to place an invalid Destination parameter in the Transport field.[5] 494 Request Header Too Large (Nginx) Nginx internal code similar to 431 but it was introduced earlier.[26][original research?] 495 Cert Error (Nginx) Nginx internal code used when SSL client certificate error occurred to distinguish it from 4XX in a log and an error page redirection. 496 No Cert (Nginx) Nginx internal code used when client didn't provide certificate to distinguish it from 4XX in a log and an error page redirection. 497 HTTP to HTTPS (Nginx) Nginx internal code used for the plain HTTP requests that are sent to HTTPS port to distinguish it from 4XX in a log and an error page redirection. 499 Client Closed Request (Nginx) Used in Nginx logs to indicate when the connection has been closed by client while the server is still processing its request, making server unable to send a status code back.[27] 5xx Server Error The server failed to fulfill an apparently valid request.[2] Response status codes beginning with the digit "5" indicate cases in which the server is aware that it has encountered an error or is otherwise incapable of performing the request. Except when responding to a HEAD request, the server should include an entity containing an explanation of the error situation, and indicate whether it is a temporary or permanent condition. Likewise, user agents should display any included entity to the user. These response codes are applicable to any request method. 500 Internal Server Error A generic error message, given when no more specific message is suitable.[2] 501 Not Implemented The server either does not recognize the request method, or it lacks the ability to fulfill the request.[2] 502 Bad Gateway The server was acting as a gateway or proxy and received an invalid response from the upstream server.[2] 503 Service Unavailable The server is currently unavailable (because it is overloaded or down for maintenance).[2] Generally, this is a temporary state. 504 Gateway Timeout The server was acting as a gateway or proxy and did not receive a timely response from the upstream server.[2] 505 HTTP Version Not Supported The server does not support the HTTP protocol version used in the request.[2] 506 Variant Also Negotiates (RFC 2295) Transparent content negotiation for the request results in a circular reference.[28] 507 Insufficient Storage (WebDAV; RFC 4918) The server is unable to store the representation needed to complete the request.[4] 508 Loop Detected (WebDAV; RFC 5842) The server detected an infinite loop while processing the request (sent in lieu of 208). 509 Bandwidth Limit Exceeded (Apache bw/limited extension) This status code, while used by many servers, is not specified in any RFCs. 510 Not Extended (RFC 2774) Further extensions to the request are required for the server to fulfill it.[29] 511 Network Authentication Required (RFC 6585) The client needs to authenticate to gain network access. Intended for use by intercepting proxies used to control access to the network (e.g. "captive portals" used to require agreement to Terms of Service before granting full Internet access via a Wi-Fi hotspot).[19] 551 Option not supported (RTSP) An option given in the Require or the Proxy-Require fields was not supported. The Unsupported header should be returned stating the option for which there is no support.[5] 598 Network read timeout error (Unknown) This status code is not specified in any RFCs, but is used by Microsoft HTTP proxies to signal a network read timeout behind the proxy to a client in front of the proxy.[citation needed] 599 Network connect timeout error (Unknown) This status code is not specified in any RFCs, but is used by Microsoft HTTP proxies to signal a network connect timeout behind the proxy to a client in front of the proxy.[citation ne
16
HTTP 1.0 The Entity-Header contains useful information about the Entity-Body to follow Entity-Header = Allow | Content-Encoding | Content-Length | Content-Type | Expires | Last-Modified | extension-header
17
HTTP 1.0 HTTP/1.1 200 OK Date: Fri, 29 Aug 2012 00:59:56 GMT
Server: Apache/ (Unix) Accept-Ranges: bytes Content-Length: 1595 Connection: close Content-Type: text/html; charset=ISO
18
HTTP 1.1 HTTP 1.1 fixes many problems with HTTP 1.0, but it is more complex. This version is improved by extending or refining the options available to HTTP e.g. There are more commands such as TRACE and CONNECT You should use absolute URLs, particularly for connecting by proxies, e.g. GET HTTP/1.1 There are more attributes such as If-Modified-Since, also for use by proxies 9.8 TRACE The TRACE method is used to invoke a remote, application-layer loop-back of the request message. The final recipient of the request SHOULD reflect the message received back to the client as the entity-body of a 200 (OK) response. TRACE allows the client to see what is being received at the other end of the request chain and use that data for testing or diagnostic information. 9.9 CONNECT This specification reserves the method name CONNECT for use with a proxy that can dynamically switch to being a tunnel (e.g. SSL tunneling). If-Modified-Since The If-Modified-Since HTTP header tells a search engine spider one of two things about a webpage... This webpage has not changed, no need to download again. This webpage has changed so download again because there is new information. To find out if your webpages support If Modified, use the if modified since command. 14.25 If-Modified-Since The If-Modified-Since request-header field is used with a method to make it conditional: if the requested variant has not been modified since the time specified in this field, an entity will not be returned from the server; instead, a 304 (not modified) response will be returned without any message-body. If-Modified-Since = "If-Modified-Since" ":" HTTP-date An example of the field is: If-Modified-Since: Sat, 29 Oct :43:31 GMT A GET method with an If-Modified-Since header and no Range header requests that the identified entity be transferred only if it has been modified since the date given by the If-Modified-Since header. The algorithm for determining this includes the following cases: a) If the request would normally result in anything other than a 200 (OK) status, or if the passed If-Modified-Since date is invalid, the response is exactly the same as for a normal GET. A date which is later than the server's current time is invalid. b) If the variant has been modified since the If-Modified-Since date, the response is exactly the same as for a normal GET. c) If the variant has not been modified since a valid If- Modified-Since date, the server SHOULD return a 304 (Not Modified) response.
19
HTTP 1.1 The changes include
hostname identification (allows virtual hosts) content negotiation (multiple languages) persistent connections (reduces TCP overheads - this is very messy) chunked transfers byte ranges (request parts of documents) proxy support To describe the protocols the 0.9 protocol took one page. The 1.0 protocol was described in about 20 pages. 1.1 takes 120 pages. Content negotiation MAY be used to select the appropriate response format. If no response body is included, the response MUST include a Content-Length field with a field-value of "0". HTTP persistent connection, also called HTTP keep-alive, or HTTP connection reuse, is the idea of using a single TCP connection to send and receive multiple HTTP requests/responses, as opposed to opening a new connection for every single request/response pair. Chunked transfer encoding is a data transfer mechanism in version 1.1 of the Hypertext Transfer Protocol (HTTP) in which data is sent in a series of "chunks". It uses the Transfer-Encoding HTTP header in place of the Content-Length header, which the protocol would otherwise require. Byte serving is the process of sending only a portion of an HTTP/1.1 message from a server to a client. Byte serving uses the Range HTTP request header and the Accept-Ranges and Content-Range HTTP response headers. HTTP/1.1 defines the "close" connection option for the sender to signal that the connection will be closed after completion of the response. For example, Connection: close in either the request or the response header fields indicates that the connection SHOULD NOT be considered `persistent' (section 8.1) after the current request/response is complete. HTTP/1.1 applications that do not support persistent connections MUST include the "close" connection option in every message. HTTP 1.2 Released with Improved Support for Hierarchies and Text-Menu Interfaces With the new 1.2 version, HTTP gets a much stronger support for resource hierarchies and gets better support for text menu interfaces, which are well-suited to computing environments like mobile clients. As part of its design goals, HTTP 1.2 functions and appears much like a mountable read-only global network file system. A system supporting this latest version, consists of a series of hierarchical hyper-linkable menus. The choice of menu items and titles is controlled by the administrator of the server.
20
HTTP 1.1 Character Set HTTP messages use the US ASCII character set
Some parts of a message need not be understood by the HTTP client or server, but are intended for other parts of the application These "content" parts can be in any character set HTTP 1.2 Released with Improved Support for Hierarchies and Text-Menu Interfaces With the new 1.2 version, HTTP gets a much stronger support for resource hierarchies and gets better support for text menu interfaces, which are well-suited to computing environments like mobile clients. As part of its design goals, HTTP 1.2 functions and appears much like a mountable read-only global network file system. A system supporting this latest version, consists of a series of hierarchical hyper-linkable menus. The choice of menu items and titles is controlled by the administrator of the server.
21
HTTP 1.1 & 1.2 Requests The set of requests has been expanded to
"OPTIONS" "GET" "HEAD" "POST" "PUT" "DELETE" "TRACE" "CONNECT" extension-method HTTP 1.2 Released with Improved Support for Hierarchies and Text-Menu Interfaces With the new 1.2 version, HTTP gets a much stronger support for resource hierarchies and gets better support for text menu interfaces, which are well-suited to computing environments like mobile clients. As part of its design goals, HTTP 1.2 functions and appears much like a mountable read-only global network file system. A system supporting this latest version, consists of a series of hierarchical hyper-linkable menus. The choice of menu items and titles is controlled by the administrator of the server. 1 OPTIONS -- Describes the communication options for the target resource. 2 GET -- The GET method is used to retrieve information from the given server using a given URI. Requests using GET should only retrieve data and should have no other effect on the data. 3 HEAD --Same as GET, but transfers the status line and header section only. 4 POST -- A POST request is used to send data to the server, for example, customer information, file upload, etc. using HTML forms. 5 PUT -- Replaces all current representations of the target resource with the uploaded content. 6 DELETE -- Removes all current representations of the target resource given by a URI. 7 TRACE -- Performs a message loop-back test along the path to the target resource. 8 CONNECT -- Establishes a tunnel to the server identified by a given URI. 1) OPTIONS:- Used when the client wants to determine other available methods to retrieve or process a document on the Web server. 2) GET:- Used when the client is requesting a resource on the Web server. 3) HEAD:- Used when the client is requesting some information about a resource but not requesting the resource itself. 4) POST:- Used when the client is sending information or data to the server—for example, filling out an online form (i.e. Sends a large amount of complex data to the Web Server). 5) PUT:- Used when the client is sending a replacement document or uploading a new document to the Web server under the request URL. 6) DELETE:- Used when the client is trying to delete a document from the Web server, identified by the request URL. 7) TRACE:- Used when the client is asking the available proxies or intermediate servers changing the request to announce themselves. 8) CONNECT:- Used when the client wants to establish a transparent connection to a remote host, usually to facilitate SSL-encrypted communication (HTTPS) through an HTTP proxy.
22
Content Negotiation An HTTP request can specify what types of content it can handle by the entity headers Accept Accept-Charset Accept-Encoding Accept-Language The Accept header can tell what type of document can be handled Accept: audio/*; q=0.2, audio/basic In the Hypertext Transfer Protocol (HTTP), content negotiation is the mechanism that is used, when facing the ability to serve several equivalent contents for a given URI, to provide the best suited one to the final user. The determination of the best suited content is made through one of three mechanisms: specific HTTP headers by the client (server-driven negotiation) the 300 Multiple Choices or 406 Not Acceptable HTTP response codes by the server (agent-driven negotiation) a cache (transparent negotiation) Server-driven negotiation In this kind of negotiation, the browser (or any other kind of agent) sends several HTTP headers along with the URI. The Accept: header The Accept-Charset: header The Accept-Encoding: header The Accept-Language: header The User-Agent: header Agent-driven negotiation Server-driven negotiation suffers from a few downsides: It doesn't scale well. There is one header per feature used in the negotiation. If one wants to use screen size, resolution or other dimensions, a new HTTP header must be created. Sending of the headers must be done on every request. This is not too problematic with few headers, but with the eventual multiplications of them, the message size would lead to a decrease in performance. The more headers are sent, the more entropy is sent, allowing for better HTTP fingerprinting and corresponding privacy concern.
23
Content Negotiation Accept-Charset can tell the character sets handled
Accept-Charset: iso , unicode-1-1;q=0.8 Accept-Encoding can tell the encodings handled Accept-Encoding: compress;q=0.5, gzip;q=1.0 Accept-Language Accept-Language: da, en-gb;q=0.8, en;q=0.7 Accept-Charset The Accept-Charset request-header field can be used to indicate what character sets are acceptable for the response. This field allows clients capable of understanding more comprehensive or special- purpose character sets to signal that capability to a server which is capable of representing documents in those character sets. Accept-Charset = "Accept-Charset" ":" 1#( ( charset | "*" )[ ";" "q" "=" qvalue ] ) Character set values are described in section 3.4. Each charset MAY be given an associated quality value which represents the user's preference for that charset. The default value is q=1. An example is Accept-Charset: iso , unicode-1-1;q=0.8 The special value "*", if present in the Accept-Charset field, matches every character set (including ISO ) which is not mentioned elsewhere in the Accept-Charset field. If no "*" is present in an Accept-Charset field, then all character sets not explicitly mentioned get a quality value of 0, except for ISO , which gets a quality value of 1 if not explicitly mentioned. The example Accept: audio/*; q=0.2, audio/basic SHOULD be interpreted as "I prefer audio/basic, but send me any audio type if it is the best available after an 80% mark-down in quality." The HTTP/1.1 specification (RFC 2068) defines an Accept-Charset header, but fails to define a wildcard "*" which could be used in this header to match all character sets. This proposal corrects this omission. A wildcard in the Accept-Charset header is considered important, because it allows a better specification of the acceptance of many character sets if it is used in combination with q values. The support for many different character sets is one possible route (or transition path) for web internationalization. The existence of this path, and the desirability of enabling it, was not properly recognized when he HTTP/1.1 specification [1] was written. A wildcard can only be used to give an inaccurate specification of the support levels for many character sets under HTTP/1.x-based server-driven negotiation [1], and this inaccuracy may lead to problems. When used in HTTP transparent content negotiation [2] however, the wildcard does not cause inaccurate end results, and in fact can be used as a bandwidth-saving device (see section of [3]). Accept-Charset: iso , unicode-1-1;q=0.8 | The special value "*", if present in the Accept-Charset field, | matches every character set (including ISO ) which is not | mentioned elsewhere in the Accept-Charset field. If no "*" is | present in an Accept-Charset field, then all character sets not | explicitly mentioned get a quality value of 0, except for | ISO , which gets a quality value of 1 if not explicitly | mentioned. If no Accept-Charset header is present, the default is that any character set is acceptable.
24
Dates For caching and expires, the client and server need to use dates
HTTP recognises three date formats Sun, 06 Nov :49:37 GMT ; RFC 822, updated by RFC 1123 Sunday, 06-Nov-94 08:49:37 GMT ; RFC 850, obsoleted by RFC 1036 Sun Nov 6 08:49: ; ANSI C's asctime() format SP = single space
25
Dates HTTP-date = rfc1123-date | rfc850-date | asctime-date
rfc1123-date = wkday "," SP date1 SP time SP "GMT“ rfc850-date = weekday "," SP date2 SP time SP "GMT“ asctime-date = wkday SP date3 SP time SP 4DIGIT date1 = 2DIGIT SP month SP 4DIGIT ; day month year (e.g., 02 Jun 1982) date2 = 2DIGIT "-" month "-" 2DIGIT ; day-month-year (e.g., 02-Jun-82) date3 = month SP ( 2DIGIT | ( SP 1DIGIT )) ; month day (e.g., Jun 2) SP = single space
26
Dates time = 2DIGIT ":" 2DIGIT ":" 2DIGIT ; 00:00:00 - 23:59:59
wkday = "Mon" | "Tue" | "Wed" | "Thu" | "Fri" | "Sat" | "Sun" weekday = "Monday" | "Tuesday" | "Wednesday" | "Thursday" | "Friday" | "Saturday" | "Sunday" month = "Jan" | "Feb" | "Mar" | "Apr" | "May" | "Jun" | "Jul" | "Aug" | "Sep" | "Oct" | "Nov" | "Dec" SP = single space
27
Authentication If a server wishes the client to authenticate its request, it does so by first rejecting the request with a "401" message. As part of this rejection, it should indicate in the "WWW-Authenticate" field information about the authorisation "realm“ So that the client can determine if it possesses an authorisation for that realm. HTTP is able to use several authentication mechanisms to control access to specific websites and applications. Some of these methods use the 401 status code and the www authenticate response header. Basic The username and password are sent as an unencrypted base64 encoded text. You should always use HTTPS, as the password is not encrypted and can easily be captured and reused if you use HTTP and not HTTPS. Digest The credentials are passed to the server in hashed form. Although the credentials cannot be captured over HTTP, the request can be replayed using the hashed credentials. NTLM This method uses a secure challenge/response mechanism that does not allow password capture or replay attacks if you use HTTP. It only work with HTTP/1.1 persistent connections. You cannot always use it with all HTTP proxies. Also you should not use this method if the connections are regularly closed by your web server. Advantages The credentials are actually hashed (MD5 (username: realm: password)) The nonce the server returns can contain timestamps. This information can be used to prevent replay attacks The server is can create a list of recently issued or used server nonce values in order to prevent reuse Disadvantages This method is vulnerable, and the client could not verify the server’s identity this way. Also, some passwords are stored using reversible encryption. NTLM Authentication NTLM is an authentication method developed by Microsoft and optimized for Windows platforms. NTLM is considered to be more secure than Digest. NT LAN Manager (NTLM) is a challenge-response authentication method that is a securer variation of Digest authentication. NTLM uses Windows credentials and not unencoded credentials and requires multiple interactions between the client and the server.
28
Authentication The client can then try again, but this time it includes a user-id and password. This is not a very secure scheme. All the HTTP messages are sent in plain text format. The user-id and password are not encrypted in any way. This is not secure! Hence Https.. HTTP is able to use several authentication mechanisms to control access to specific websites and applications. Some of these methods use the 401 status code and the www authenticate response header. Basic The username and password are sent as an unencrypted base64 encoded text. You should always use HTTPS, as the password is not encrypted and can easily be captured and reused if you use HTTP and not HTTPS. Digest The credentials are passed to the server in hashed form. Although the credentials cannot be captured over HTTP, the request can be replayed using the hashed credentials. NTLM This method uses a secure challenge/response mechanism that does not allow password capture or replay attacks if you use HTTP. It only work with HTTP/1.1 persistent connections. You cannot always use it with all HTTP proxies. Also you should not use this method if the connections are regularly closed by your web server. Advantages The credentials are actually hashed (MD5 (username: realm: password)) The nonce the server returns can contain timestamps. This information can be used to prevent replay attacks The server is can create a list of recently issued or used server nonce values in order to prevent reuse Disadvantages This method is vulnerable, and the client could not verify the server’s identity this way. Also, some passwords are stored using reversible encryption. NTLM Authentication NTLM is an authentication method developed by Microsoft and optimized for Windows platforms. NTLM is considered to be more secure than Digest. NT LAN Manager (NTLM) is a challenge-response authentication method that is a securer variation of Digest authentication. NTLM uses Windows credentials and not unencoded credentials and requires multiple interactions between the client and the server.
29
POST versus GET "Normal" queries use GET.
Strictly, if a request is "idempotent" it should use GET. Idempotent means that the client is not asking for a state change in the server, and would expect a repeat request to return the same result. This is the norm for static document requests GET There are other methods but for the most part GET and POST are your two basic alternatives. This is the method that the form will use to send the information to the server. Note the form data will be sent in plain text regardless of whether GET or POST is used. First let’s look at GET. The simple thing to note about GET is that when you use GET the content of your form will become url encoded into a query string. GET is meant for when you get from the server rather than POST being meant to post to the server. One useful thing to note about query strings appended to your URL and the GET method is that search terms used with GET can be bookmarked as a result set. GET data turned into a query string in the browser address bar whereas the POST data is not turned into a query string in the browser address bar. POST, on the other hand, is sent in the body of the request after the headers and is not visible to the user as a query string. In the end GET and POST are both sent as plain text and a smart person positioned between your computer and the server can retrieve the data if they are so inclined. A further note, if you’re looking at CGI programming, GET and POST are treated differently. GET is available as an environment variable whereas POST is available through the standard input stream (stdin). A good basic rule is when you’re sending off form data you should be using POST. If you are just running search queries then GET might be appropriate. Safe means a request doesn’t cause any side effects. A safe request just grabs data from a database and display it. Static pages, browsing source code, reading your online — these are all “safe” requests. Idempotent means that doing the request 10 times has the same effect as doing it once. An idempotent request might create something in a database the first time, but it won’t do it again. Or it’ll just return the reference to it the next time around.
30
POST versus GET GET should also be used for idempotent form requests. Again, these are ones that do not cause any (visible) change of state. GET Parameters are passed after a '?', in the form vbl=value. Any problematic characters have to be escaped. e.g. space is written as its Ascii value in hex as '%20' (or '+'). GET url's can become very long. They can also be a security leak since the form data is visible in the url and is often saved in bookmarks, log files, etc. There are other methods but for the most part GET and POST are your two basic alternatives. This is the method that the form will use to send the information to the server. Note the form data will be sent in plain text regardless of whether GET or POST is used. First let’s look at GET. The simple thing to note about GET is that when you use GET the content of your form will become url encoded into a query string. GET is meant for when you get from the server rather than POST being meant to post to the server. One useful thing to note about query strings appended to your URL and the GET method is that search terms used with GET can be bookmarked as a result set. GET data turned into a query string in the browser address bar whereas the POST data is not turned into a query string in the browser address bar. POST, on the other hand, is sent in the body of the request after the headers and is not visible to the user as a query string. In the end GET and POST are both sent as plain text and a smart person positioned between your computer and the server can retrieve the data if they are so inclined. A further note, if you’re looking at CGI programming, GET and POST are treated differently. GET is available as an environment variable whereas POST is available through the standard input stream (stdin). A good basic rule is when you’re sending off form data you should be using POST. If you are just running search queries then GET might be appropriate. Safe means a request doesn’t cause any side effects. A safe request just grabs data from a database and display it. Static pages, browsing source code, reading your online — these are all “safe” requests. Idempotent means that doing the request 10 times has the same effect as doing it once. An idempotent request might create something in a database the first time, but it won’t do it again. Or it’ll just return the reference to it the next time around.
31
POST versus GET Note that a GET request that e.g. increases a count of logins to the server is still regarded as idempotent since it is not visible to the client. Queries may be intended to result in state changes on the server. e.g. uploading a file, confirming a transaction, etc. These queries should use POST, and include form data in the content part of the message. SOAP (see later) is criticised for forcing use of POST even for idempotent queries. SOAP, originally defined as Simple Object Access Protocol, is a protocol specification for exchanging structured information in the implementation of Web Services in computer networks. It relies on Extensible Markup Language (XML) for its message format, and usually relies on other Application Layer protocols, most notably Hypertext Transfer Protocol (HTTP) or Simple Mail Transfer Protocol (SMTP), for message negotiation and transmission There are other methods but for the most part GET and POST are your two basic alternatives. This is the method that the form will use to send the information to the server. Note the form data will be sent in plain text regardless of whether GET or POST is used. First let’s look at GET. The simple thing to note about GET is that when you use GET the content of your form will become url encoded into a query string. GET is meant for when you get from the server rather than POST being meant to post to the server. One useful thing to note about query strings appended to your URL and the GET method is that search terms used with GET can be bookmarked as a result set. GET data turned into a query string in the browser address bar whereas the POST data is not turned into a query string in the browser address bar. POST, on the other hand, is sent in the body of the request after the headers and is not visible to the user as a query string. In the end GET and POST are both sent as plain text and a smart person positioned between your computer and the server can retrieve the data if they are so inclined. A further note, if you’re looking at CGI programming, GET and POST are treated differently. GET is available as an environment variable whereas POST is available through the standard input stream (stdin). A good basic rule is when you’re sending off form data you should be using POST. If you are just running search queries then GET might be appropriate. Safe means a request doesn’t cause any side effects. A safe request just grabs data from a database and display it. Static pages, browsing source code, reading your online — these are all “safe” requests. Idempotent means that doing the request 10 times has the same effect as doing it once. An idempotent request might create something in a database the first time, but it won’t do it again. Or it’ll just return the reference to it the next time around.
32
URL The class java.net.URL is designed to make it easier to handle fetching URL data. You don't have to open sockets yourself SimpleURL.java Usage: java SimpleURL <url-domain-name> Run SimpleURL program /** * SimpleURL.java */ import java.net.*; import java.io.*; public class SimpleURL{ public SimpleURL (){ } public static void main(String[] args){ if (args.length != 1) { System.err.println("Usage: java SimpleURL url"); System.exit(1); URL url = null; try { url = new URL(args[0]); } catch(MalformedURLException e) { e.printStackTrace(); Object content = null; content = url.getContent(); } catch(IOException e) { System.exit(2); // The content is actually a subclass of InputStream BufferedReader reader = null; reader = new BufferedReader(new InputStreamReader((InputStream) content)); String line; String x = ""; while ((line = reader.readLine()) != null) { System.out.println(line); } catch (IOException e) { System.exit(3); } // SimpleURL
33
URLConnection If you want to get extra information about the URL, the class URLConnection can be used URLInfo.java Usage: java URLInfo <url-domain-name> Run URLInfo --- you will get:- connection.getContentType connection.getContentLength information on the URL /** * URLInfo.java */ import java.net.*; import java.io.*; public class URLInfo{ public URLInfo (){ } public static void main(String[] args){ if (args.length != 1) { System.err.println("Usage: java URLInfo url"); System.exit(1); URL url = null; try { url = new URL(args[0]); } catch(MalformedURLException e) { e.printStackTrace(); URLConnection connection = null; InputStream str = null; connection = url.openConnection(); connection.connect(); } catch(IOException e) { System.out.println("Type " + connection.getContentType()); System.out.println("Length " + connection.getContentLength()); } // URLInfo
34
Forms HTTP connections write information from client to HTTP server and read the reply. This can be used to fetch static URLs (as above). It can also be used to post form data and read the reply. The URLConnection can be used for this. URLConnection can be used to POST data in any reference forms you may have in the URL.
35
Authorization To access a password-protected site, you need to send the user name and password The article describes how to do this! [1] Authenticator.setDefault (new MyAuthenticator ()); [2] Create Authenticator subclass class MyAuthenticator extends Authenticator { protected PasswordAuthentication getPasswordAuthentication () { return new PasswordAuthentication ("username", "password"); }
36
Proxies To fetch a url using a proxy, you send the HTTP request to the proxy The full url of the destination must be given Java has built-in support for proxies; you only need to define some properties java -DproxySet=true\ -DproxyHost=proxy.monash.edu.au\ -DproxyPort=8080\ SimpleURL \ Some proxies (e.g. the Monash proxy) require authentication before passing on requests
37
Server-side Processing - Servlets
Java servlets run as modules within an HTTP server Servlets run as threads within a continuous Java process The startup costs for a thread are lower than the startup costs for a process Servlets are faster than the corresponding CGI script Servlets have a standard API for accessing cookies, etc. Servlets also have session management Week 6 source code examples: HelloWWW.java Through the introduction of HTML and its distribution system, the World Wide Web, use of the Internet has mushroomed at a phenomenal rate. However, HTML alone can only be used to create static Web pages ⎯ pages whose content is determined at the time of writing and which never changes. Though this is perfectly adequate for some applications, an increasing number of others have a requirement for dynamic web pages ⎯ pages whose content changes according to the particular user or in response to changing data. Some common examples are listed below. • Results of a real-time, online survey. • Results of a search operation. • Contents of an electronic shopping cart. One powerful and increasingly popular way of satisfying this need is to use Java servlets. Common Gateway Interface (CGI) is a standard method for web server software to delegate the generation of web content to executable files. Such files are known as CGI scripts or simply CGIs; they are usually written in a scripting language. A servlet is a program written in Java that runs on a Web server. It is executed in response to a client's (i.e., a browser's) HTTP request and creates a document (usually an HTML document) to be returned to the client by the server. It extends the functionality of the server, without the performance limitations associated with CGI programs. All the major Web servers now have support for servlets. A servlet is Java code that is executed on the server, while an applet is Java code that is executed on the client. As such, a servlet may be considered to be the serverside equivalent of an applet. However, Java's servlet API is not part of the J2SE (Java 2 Standard Edition), though it is included in the J2EE (Java 2 Enterprise Edition). This means that non-Enterprise users must download an implementation of the Java servlet API.
38
import javax.servlet.*; import javax.servlet.http.*;
import java.io.*; import javax.servlet.*; import javax.servlet.http.*; public class HelloWWW extends HttpServlet { public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException response.setContentType("text/html"); PrintWriter out = response.getWriter(); String docType = "<!DOCTYPE HTML PUBLIC \"-//W3C/DTD HTML 4.0" + "Transitional//EN\"\n"; out.println(docType + "<HTML>\n" + "<head><title>Hello WWW</title></head>\n" + "<body>\n" + "<h1> Hello WWW</h1>" + "</body></html>"); } Servlets must import the following two packages: • javax.servlet • javax.servlet.http servlet output uses a PrintWriter stream, package java.io is required. Servlets that use the HTTP protocol must extend class HttpServlet from package java.servlet.http. The two most common HTTP requests (as specified in the HTML pages that make use of servlets) are GET and POST. At the servlet end, method service uses either method doGet or method doPost in response to these requests. The programmer should override (at least) one of these two methods. Without going into unnecessary detail, you should use the POST method for multiple data items and either GET or POST for single items. All three methods (doGet, doPost and service) have a void return type and take the following two arguments: • an HttpServletRequest object; • an HttpServletResponse object. The former encapsulates the HTTP request from the browser and has several methods, but none will be required by our first servlet. The second argument holds Servlets 241 the servlet's response to the client's request. There are just two methods of this HttpServletResponse object that are of interest to us at present and these are shown below. • void setContentType(String <type>) This specifies the data type of the response. Normally, this will be "text/HTML". • PrintWriter getWriter() Returns the output stream object to which the servlet can write character data to the client (using method println). There are four basic steps in a servlet... 1. Execute the setContentType method with an argument of "text/HTML". 2. Execute the getWriter method to generate a PrintWriter object. 3. Retrieve any parameter(s) from the initial Web page. (Not required in our first servlet.) 4. Use the println method of the above PrintWriter object to create elements of the Web page to be 'served up' by our Web server. The above steps are normally carried out by doGet or doPost. Note that these methods may generate IOExceptions and ServletExceptions, which are checked exceptions (and so must be either thrown or handled locally). Note also that step 4 involves a lot of tedious outputting of the required HTML tags.
39
Form Parameters Available in request.getParameter(String name)
ThreeParams.java May be invoked in browser window as follows: Before we consider the structure of a servlet, recall that a servlet will be executed on a Web server only in response to a request from a user's browser. Though the servlet may be invoked directly by entering its URL into the browser, it is more common for a servlet to be run from HTML page as HTML form, with the form's METHOD specifying either 'GET' or 'POST‘ and its ACTION specifying the address of the servlet. The URL for such a servlet has the following format given here.
40
public class ThreeParams extends HttpServlet {
public void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException { response.setContentType("text/html"); PrintWriter out = response.getWriter(); String title = "Reading Three Request Parameters"; out.println(ServletUtilities.headWithTitle(title) + "<BODY>\n" + "<H1 ALIGN=CENTER>" + title + "</H1>\n" + "<UL>\n" + " <LI>param1: " + request.getParameter("param1") + "\n" + " <LI>param2: " + request.getParameter("param2") + "\n" + " <LI>param3: " + request.getParameter("param3") + "\n" + "</UL>\n" + "</BODY></HTML>"); } public void doPost(HttpServletRequest request, doGet(request, response);
41
public class ServletUtilities { public static final String DOCTYPE =
"<!DOCTYPE HTML PUBLIC \"-//W3C//DTD HTML 4.0 Transitional//EN\">”; public static String headWithTitle(String title) { return(DOCTYPE + "\n" + "<HTML>\n" + "<HEAD><TITLE>" + title + "</TITLE></HEAD>\n"); } After this Make sure Apache Tomcat is running, open a browser and type: to run the AdderServlet Finally, to run the ShoppingCart servlets type:
42
Session Tracking There is a session tracking class so that you don't need to bother with the details of cookies, URL rewriting, etc To get a session HttpSession session = request.getSession(true); This creates a session object if one did not exist before, or returns the current session object You will see this in the ShoppingCart servlet in the class Selection.java Sessions One fundamental restriction of HTTP is that it is a stateless protocol. That is to say, each request and each response is a independent transaction. However, different parts of a Web site often need to know about data gathered in other parts. For example, the contents of a customer's electronic cart on an ecommerce shopping site need to be updated as the customer visits various pages and selects purchases. To cater for this and a great number of other applications, servlets implement the concept of a session. A session is a container where data about a client's activities may be stored and accessed by any of the servlets that have access to the session object. The session expires automatically after a prescribed timeout period (30 minutes for Tomcat) has elapsed or may be invalidated explicitly by the servlet (by execution of method invalidate). A session object is created by means of the getSession method of class HttpServletRequest. This method is overloaded: • HttpSession getSession() • HttpSession getSession(boolean create) If the first version is used or the second version is used with an argument of true, then the server returns the current session if there is one; otherwise, it creates a new session object. For example: HttpSession cart = request.getSession(); If the second version is used with an argument of false, then the current session is returned if there is one, but null is returned otherwise. A session object contains a set of name-value pairs. Each name is of type String and each value is of type Object. Note that objects added to a session must implement the Serializable interface. (This is true for the String class and for the type wrapper classes such as Integer.) A servlet may add information to a session object via the following method: void setAttribute(String <name>, Object <value>) Example String currentProduct=request.getParameter("Product"); cart.setAttribute("currentProd",currentProduct); The method to remove an item is removeAttribute, which has the following signature: Object removeAttribute(String <name>) For example: cart.removeAttribute(currentProduct); To retrieve a value, use: Object getAttribute (String <name>) Note that a typecast will usually be necessary after retrieval. For example: String product = (String)cart.getAttribute("currentProd"); To get a list of all named values held, use: String[] getAttributeNames() String[] prodName = cart.getAttributeNames();
43
Session Tracking Different sessions from different browsers will each get their own session object. session object - are not shared To add to a session void setAttribute(String<name>, Object<value>); A session object contains a set of name-value pairs. Each name is of type String and each value is of type Object. Note that objects added to a session must implement the Serializable interface. (This is true for the String class and for the type wrapper classes such as Integer.) A servlet may add information to a session object via the following method: void setAttribute(String <name>, Object <value>) Example String currentProduct=request.getParameter("Product"); HttpSession cart = request.getSession(); cart.setAttribute("currentProd",currentProduct); The method to remove an item is removeAttribute, which has the following signature: Object removeAttribute(String <name>) For example: cart.removeAttribute(currentProduct); To retrieve a value, use: Object getAttribute (String <name>) Note that a typecast will usually be necessary after retrieval. For example: String product = (String)cart.getAttribute("currentProd"); To get a list of all named values held, use: String[] getAttributeNames() String[] prodName = cart.getAttributeNames();
44
Session Tracking To retrieve, Object getAttribute(String<name>);
The retrieved value will need to be cast to the right type
45
Cookies To set a cookie, use
Cookie cookie = new Cookie("sessionID", "1234"); Once a cookie has been created, it must be added to the HttpServletResponse object via the following method: void addCookie(Cookie<name>); To access a cookie, use the following method of class HttpServletRequest Cookie[] getCookies(); Refer to Jan Graba Chapter 8 – CookieAdder which modifies the SimpleAdder servlet Cookies Cookies provide another means of storing a user's data for use whilst he/she is navigating a Web site. Whereas sessions provide data only for the duration of one visit to the site, though, cookies store information that may be retrieved on subsequent visits to the site. (In actual fact, Session objects make use of Cookie objects.) They can be used to personalise pages for the user and/or select his/her preferences. Cookies have been used by CGI programmers for years and the developers of Java's servlet API incorporated this de facto standard into the servlet specification. What is a cookie, though? A cookie is an associated name-value pair in which both name and value are strings. (E.g., "username" and "Bill Johnson".) It is possible to maintain a cookie simply for the duration of a browsing session, but it is usually stored on the client computer for future use. Each cookie is held in a small file sent by the server to the Servlets 261 client machine and retrieved by the server on subsequent visits by the user to the site. The constructor for a Java Cookie object must have this signature: Cookie(String <name>, String <name>) (Note that there is no default constructor.) Once a cookie has been created, it must be added to the HttpServletResponse object via the following HttpServletResponse method : void addCookie(Cookie <name>) For example: response.addCookie(myCookie); Cookies are retrieved via the following method of class HttpServletRequest: Cookie[] getCookies() Cookie[] cookie = request.getCookies(); The lifetime of cookie is determined by method setMaxAge, which specifies the number of seconds for which the cookie will remain in existence (usually a rather 262 An Introduction to Network Programming with Java large number!). If any negative value is specified, then the cookie goes out of existence when the client browser leaves the site. A value of zero causes the cookie's immediate destruction. Other useful methods of the Cookie class (with pretty obvious purposes) are shown below. • void setComment(String <value>) (A comment is optionally used to describe the cookie.) • String getComment() • String getName() • String getValue() • void setValue(String <value>) • int getMaxAge() Example This will be a modification of the earlier 'Simple Adder' example. On the user's first visit to the site, he/she will be prompted to enter his/her name and a choice of both foreground and background colours for the addition result page. These values will be saved in cookies, which will be retrieved on subsequent visits to the site. If the user fails to enter a name, there will be no personalised header. Failure to select a foreground colour will result in a default value of black being set, whilst failure to select a background colour will result in a default value of white being set. The only differences in the initial HTML file are in the lines giving the names of the two files involved:
46
References Refer: http://www2.themanualpage.org/http/index.php3
Chapter 12, Distributed Systems Principles and Paradigms (2007), by Andrew S. Tanenbaum and Maarten Van Steen, 2nd edition, Prentice-Hall. Chapter 8, Introduction to Network Programming with Java (2007), by Jan Graba, Springer.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.