Presentation is loading. Please wait.

Presentation is loading. Please wait.

24 January 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012.

Similar presentations


Presentation on theme: "24 January 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012."— Presentation transcript:

1 24 January 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012

2 24 January 2012Kaiser: COMS E61252 Today’s Topic Basic Web Mechanics –URI –HTTP –Client/Server Intermediaries

3 24 January 2012Kaiser: COMS E61253 What is a “URI”? Uniform Resource Identifier Compact string of characters for identifying an abstract or physical resource Conforms to a simple and extensible format Example:

4 24 January 2012Kaiser: COMS E61254 What is a “Resource”? Some piece of information that can be identified by a URI The most common kind of resource is a file But may also be a dynamically-generated query result, the output of a script, a document available in several languages or formats, etc.

5 24 January 2012Kaiser: COMS E61255 Uniform Resource Identifier Uniform: aka Universal - same string can be used with same semantic interpretation, even when mechanisms used to access the resource differ Resource: Conceptual mapping to an entity or set of entities - not necessarily the entity that corresponds to that mapping at any particular instance in time Identifier: An object that can act as a reference to something that has identity

6 24 January 2012Kaiser: COMS E61256 Key Requirement: Transcribability May be transcribed from non-network source Often needs to be remembered by people Should consist of characters that are most likely to be able to be typed into a computer, within the constraints imposed by keyboards (and related input devices) across languages and locales

7 24 January 2012Kaiser: COMS E61257 Why do we usually say URL rather than URI? A Uniform Resource Locator (URL) refers to the subset of URIs that identify resources via a representation of their primary access mechanism (i.e., their network “location”)Uniform Resource Locator Most popular form of URI

8 24 January 2012Kaiser: COMS E61258 What’s a URI that’s not a URL? URN = Uniform Resource NameUniform Resource Name Subset of URIs that denote a resource independent of its current location, the name by which it is known, or the mechanism by which it is accessed Required to remain globally unique and persistent even when the resource ceases to exist or becomes unavailable Thus not necessarily “retrievable”

9 24 January 2012Kaiser: COMS E61259 URN vs. URL Example Assume a published book (the resource) The ISBN (International Standard Book Number) is a 10-digit number that uniquely identifies books and book-like products published internationally - this is the URNISBN The entire contents of the book might be placed on a Web server at and an Ftp server at ftp://ftp.xyz.com/book.gz - both of these are URLsFtp All of these are URIs

10 24 January 2012Kaiser: COMS E URI Syntax : For a URL, the scheme indicates the protocol employed for retrieval (http, ftp, file, mailto, etc.) More generally, a scheme is a specification for defining the syntax and semantics of the rest of the URI Extensible because new schemes can be defined, with their own scheme-specific format after the colon (:)

11 24 January 2012Kaiser: COMS E URL Notation :// ? typically, an Internet domain name specific to the authority, identifies the resource within the scope of the scheme and authority a string of information to be interpreted by the resource

12 24 January 2012Kaiser: COMS E What’s a “domain name”? Domain Name System (DNS)Domain Name System –Maps domain names to IP addresses and vice versa –Hierarchy of DNS servers for top level domains (.com,.edu,.uk, etc.), second level domains (columbia.edu, ibm.com, etc.), and so on –Eventually finds IP address for individual host (e.g., bank.cs.columbia.edu) –DNS servers cache responses based on TTL = Time to Live Originated ~1982, e.g., for -> ->

13 24 January 2012Kaiser: COMS E Relative URLs Allows document trees to be independent of their location and scheme A single set of hypertext documents can be simultaneously traversable via each of the ftp, http and file schemes Such document trees can be moved, as a whole, without changing any of the relative references Resolved to full (absolute) URLs using a base URL

14 24 January 2012Kaiser: COMS E Example Relative URLs /path/to/resource.txt /relative/URI/with/absolute/path/to/resource.txt relative/path/to/resource.txt../../../resource.txt resource.txt /resource.txt#frag01 #frag01 [empty string]

15 24 January 2012Kaiser: COMS E URI “Standard” URI is an Internet protocol element defined currently in RFC 3986 (2005)RFC 3986 Originally RFC1630 (1994)RFC1630

16 24 January 2012Kaiser: COMS E What is an “RFC”? Request for Comments One of a series, begun in 1969, of numbered informational documents and standards followed by commercial software and freeware in the Internet and Unix communities All Internet standards are recorded in RFCs

17 24 January 2012Kaiser: COMS E Who keeps track of RFCs? IETF = Internet Engineering Task ForceInternet Engineering Task Force Open, all-volunteer organization, with no formal membership or membership requirements Organized into a large number of working groups, each dealing with a specific topic April 1 st RFCs, see

18 24 January 2012Kaiser: COMS E What is “W3C”? World Wide Web Consortium defines data formats and usage conventions as well as Internet protocols relevant to Web Members pay fees depending on country, revenues and non-profit/for-profit status Otherwise organized similar to IETF, but writes “Recommendations” instead of “Requests for Comments”

19 24 January 2012Kaiser: COMS E Back to URLs Most Web documents use the “http” scheme (or “https” = http over TLS/SSL)httpsTLS/SSL What is “http” (HyperText Transfer Protocol)?

20 24 January 2012Kaiser: COMS E HTTP = HyperText Transfer Protocol Most Web documents are accessed using the “http” scheme, the default Internet protocol used to deliver data on WWW Usually through TCP/IP sockets on port 80, but can use any port and can be implemented on top of any reliable networking protocol A Web browser (HTTP client) sends requests to an Web server (HTTP server), which sends responses back to the client

21 24 January 2012Kaiser: COMS E What’s “TCP/IP”? IP = Internet ProtocolInternet Protocol –Delivers individual packets from one host to another, based on their IP address (in IPv4, four 8-bit octets as in ) –Network routers direct traffic of IP packets Analogous to telephone numbers (area code plus exchange plus 4 digits plus extension) and postal address (zip code plus street name plus building number plus apartment number)

22 24 January 2012Kaiser: COMS E What’s “TCP/IP”? TCP = Transmission Control ProtocolTransmission Control Protocol –Provides an abstraction of reliable, bidirectional connections for the delivery of IP packets to a particular port at a given IP address –The so-called well known ports (< 1024) are reserved for specific protocols (telnet, ftp, smtp, pop3, imap, etc.) –By default, HTTP uses port 80; this can be changed in the URL –http://www.example.com:2012/doc.html Main alternative is UDP = User Datagram Protocol, no connection, no reliable delivery (used by DNS)User Datagram Protocol

23 24 January 2012Kaiser: COMS E HTTP History HTTP/0.9 (1990) - simple protocol for raw data transferHTTP/0.9 HTTP/1.0 (1996) - allows MIME-like messages, containing meta-data about the resources transferred and modifiers on the request/response semanticsHTTP/1.0 HTTP/1.1 (1999) – lots of practical improvements, e.g., caching policies, chunked encoding, persistent connectionsHTTP/1.1 W3C closed activity but IETF still has a working group to reviseclosedrevise

24 24 January 2012Kaiser: COMS E What is “MIME”? Multipurpose Internet Mail Extensions Standard representation for “complex” message bodies (numerous RFCs since 1993) Examples include messages with embedded graphics or audio clips, messages with file attachments, messages in Japanese or Russian, signed messages

25 24 January 2012Kaiser: COMS E HTTP Properties Uses URLs for identifying Web resources Request-response – always initiated by client to server, the server responds with results Stateless – each request-response pair independent from every other, so any state information (login credentials, shopping carts, etc.) needs to be encoded somehow

26 24 January 2012Kaiser: COMS E HTTP Request/Response HTTP request Port 80 Response Other port Processing HTTP Client Web server processes HTTP requests, generally over TCP Port 80 The request specifies a resource URL The server parses the URL and processes the request: –Returns a document with its type information –Invokes a program or script, and returns its output The output (including metadata) is sent back to the client as a response message

27 24 January 2012Kaiser: COMS E HTTP Requests Small number of request types (GET, POST, HEAD, etc.) Request may contain additional information, e.g. client info, parameters for forms, cookies, etc. Consists of a start-line, zero or more headers (one per line), an empty line (CRLF) indicating the end of the header fields, and possibly a message-body

28 24 January 2012Kaiser: COMS E HTTP Responses Larger number of response codes (200 OK, 404 NOT FOUND)404 NOT FOUND Message body only allowed with certain response status codes Includes MIME metadata as well as “payload” (data)

29 24 January 2012Kaiser: COMS E Start Line HTTP Version (0.9, 1.0, 1.1) URI Method (request) or Status Code (response)

30 24 January 2012Kaiser: COMS E Sample HTTP Exchange To retrieve the file at the URL First open a socket to the host psl.cs.columbia.edu, port 80 (use the default port because none is specified in the URL)  Connect to on port ok

31 24 January 2012Kaiser: COMS E Sample Then, send something like the following through the socket: GET / HTTP/1.1[CRLF] Host: psl.cs.columbia.edu[CRLF] Connection: close[CRLF] User-Agent: Web-sniffer/ (+http://web- sniffer.net/)[CRLF] Accept-Encoding: gzip[CRLF] Accept-Charset: ISO ,UTF- 8;q=0.7,*;q=0.7[CRLF] Cache-Control: no-cache[CRLF] Accept-Language: de,en;q=0.7,en-us;q=0.3[CRLF] Referer: [CRLF]

32 24 January 2012Kaiser: COMS E The server should respond with something like the following HTTP Status Code: HTTP/ Forbidden[CRLF] Content-Length:218[CRLF] Content-Type:text/html[CRLF] Server:Microsoft-IIS/6.0[CRLF] X-Powered-By:ASP.NET[CRLF] Date: Sat, 22 Jan :024:22 GMT[CRLF] Connection:close[CRLF] Error Direct ory Listing Denied [LF] Directory Listing Denied This Virtual Directory does not allow contents to be listed. Sample

33 24 January 2012Kaiser: COMS E Some Request Headers User-Agent: identifies the program that's making the request, in the form " Program-name/x.xx ", where x.xx is the alphanumeric version of the program (e.g., browser) –User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9) Gecko/ Firefox/3.0 Referer: the URL of the previous webpage from which a link was followed –Referer:

34 24 January 2012Kaiser: COMS E Some Response Headers Server: analogous to User-Agent:, identifies the server software in the form " Program-name/x.xx " –Server: Apache/2.2.8 (Ubuntu) Last-Modified: gives the modification date of the resource that's being returned, e.g., for use in caching –Use Greenwich Mean Time, in the format Last-Modified: Sat, 22 Jan :46:32 GMT

35 24 January 2012Kaiser: COMS E HTTP URIs Up to some bounded length (often 255), or “unbounded”, status code 414 (Request- URI Too Long) Equivalence comparison

36 24 January 2012Kaiser: COMS E Request Messages Method SP Request-URI SP HTTP-Version CRLF GET Equivalent to client making TCP connection to bank.cs.columbia.edu on port 80, then sending GET / Host: Host field allows for virtual hosts

37 24 January 2012Kaiser: COMS E What is a “virtual host”? Enables the same machine to host multiple domain names, sometimes at the same IP address (name-based virtual hosting) Important for website hosting (e.g., maps to /www/foo/site1 and maps to /www/bar/site2), but usually there can be only one secure https website per IP address/port

38 24 January 2012Kaiser: COMS E GET Retrieve whatever information (in the form of an entity) is identified by the URL If the URL refers to a data-producing process, it is the produced data (given the input parameters after the “?”, if any) that is returned as the entity in the response - not the source text of the process (unless that text happens to be the output of the process) &name2=val2

39 24 January 2012Kaiser: COMS E Conditional and Partial GET Conditional if the request message includes an If-Modified-Since, If- Unmodified-Since, If-Match, If-None- Match, or If-Range header field Partial if the request message includes a Range header field Don’t retrieve data the client doesn’t need (e.g., at least the part already up to date in cache)

40 24 January 2012Kaiser: COMS E HEAD Identical to GET except that the server must not return a message-body in the response - only returns headers Often used for testing hypertext links for validity and modification Can mark cache entries as stale if certain header information changes (e.g., length, last-modified)

41 24 January 2012Kaiser: COMS E POST Used to request that the server accept the entity enclosed in the request as a new subordinate of the resource identified by the Request-URI in the Request-Line Actual function performed by the POST method is determined by the server, usually dependent on the Request-URI

42 24 January 2012Kaiser: COMS E POST supports several functions Annotation of an existing resource Posting a message to a bulletin board, newsgroup, mailing list, or similar group of articles Providing a block of data, such as the result of submitting a form, to a data- handling process Extending a database through an append operation

43 24 January 2012Kaiser: COMS E POST vs. GET GET can only be used to send relatively small amounts of data to a server, with the data following the ? character The rest of the request-URI (before the ?) refers to some kind of processing program GET /run.cgi?name1=val1& name2=val2 HTTP/1.0

44 24 January 2012Kaiser: COMS E PUT and DELETE Often unsupported (501 Not Implemented) PUT requests that the enclosed entity be stored under the supplied Request-URI –May create a new resource at a new URI, or modify an existing resource already at that URI DELETE requests that the origin server delete the resource identified by the Request-URI –May be overridden, e.g., by human intervention, even if status code indicates successfully completed Effectively supplanted by WebDAVWebDAV

45 24 January 2012Kaiser: COMS E OPTIONS and TRACE OPTIONS allows the client to determine the requirements associated with a resource, or the capabilities of a server ( OPTIONS * ), without implying a resource action or initiating a resource retrieval TRACE used to invoke application-layer loop-back of the request message, allowing the client to see what is being received at the other end of the request chain for testing or diagnostic information

46 24 January 2012Kaiser: COMS E HTTP Responses HTTP-Version SP Status-Code SP Reason- Phrase CRLF Example: HTTP/ Not Found Status code: 3-digit integer result code of the attempt to understand and satisfy the request Response phrase: short textual description of the Status-Code

47 24 January 2012Kaiser: COMS E Response Messages Larger number of response codes (200 OK, 404 NOT FOUND)404 NOT FOUND Message body only allowed with certain response status codes Includes MIME metadata as well as “payload” (data)

48 24 January 2012Kaiser: COMS E Status Codes Applications need only understand first digit, treat others as equivalent to x00 1xx: Informational - Request received, continuing process ("100" : Continue, relevant to persistent connections in HTTP 1.1) 2xx: Success - The action was successfully received, understood and accepted ("200" : OK) 3xx: Redirection - Further action must be taken in order to complete the request ("300" : Multiple Choices) 4xx: Client Error - The request contains bad syntax or cannot be fulfilled ("400" : Bad Request) 5xx: Server Error - The server failed to fulfill an apparently valid request ("500" : Internal Server Error)

49 24 January 2012Kaiser: COMS E HTTP is “Stateless” Server doesn’t remember anything about client between connections Not even between requests during the same persistent connection, except TCP data So how does HTTP support “remembering” the user during a session or across sessions? Some state can be encoded in complex URLs or otherwise in the web page itself (e.g., query strings added to links, hidden form fields) Or saved on client in “cookies”

50 24 January 2012Kaiser: COMS E Cookies String associated with a name/domain/path, stored at the browser Series of name-value pairs, interpreted by the web application Create in HTTP response with “ Set-Cookie: ” (or “ Set-Cookie2: ”) In all subsequent requests to this site, until cookie’s expiration, the client sends the HTTP header “ Cookie: ” (or “ Cookie2: ”) Often have an expiration (otherwise expire when browser closed) Various technical, privacy and security issues (e.g., inconsistent state after using “back” button, third-party cookies, cross-site scripting)

51 24 January 2012Kaiser: COMS E Cookie Example Set-Cookie: name=newvalue; expires=date; path=/; domain=.example.org Set-Cookie: RMID=732423sdfs73242; expires=Sat, 31-Dec :59:59 GMT; path=/; domain=.example.net

52 24 January 2012Kaiser: COMS E HTTP Request/Response In HTTP 1.0, a connection is established by the client prior to each request and closed by the server after sending the response Either party may close the connection prematurely, due to user action, automated time-out, or program failure Closing of the connection by either or both parties always terminates the current request, regardless of its status But TCP connections are expensive…

53 24 January 2012Kaiser: COMS E HTTP 1.1 “Persistent Connection” Many Web pages consist of several files on the same server If an HTTP 1.1 client sends multiple (pipelined) requests through a single connection, the server should send responses back in the same order Intermediate responses "100" : Continue

54 24 January 2012Kaiser: COMS E How does the connection finally get closed? If a request includes the " Connection: close " header, that request is the final one for the connection and the server should close the connection after sending the response The server should also close an idle connection after some timeout period

55 24 January 2012Kaiser: COMS E Advantages of Persistent Connections Requests and responses can be pipelined - a client makes multiple requests without waiting for each response Network congestion reduced by fewer packets for TCP opens, and by allowing TCP sufficient time to determine the congestion state of the network Latency on subsequent requests is reduced since there is no time spent in theTCP connection’s opening handshake

56 24 January 2012Kaiser: COMS E Basic HTTP Architecture

57 24 January 2012Kaiser: COMS E Intermediary Program sitting in the path between HTTP clients and servers Acts as a server to clients and as a client to origin servers or other intermediaries

58 24 January 2012Kaiser: COMS E Purposes of Intermediaries –Reduce communication cost –Lower the latency perceived by the client –Reduce the load on the network –Reduce the load on the Web server –Implement security for an organization –Translate requests to various servers

59 24 January 2012Kaiser: COMS E Proxy Forwarding agent Receives request, rewrites all or parts of the message, and forwards the reformatted request toward the server identified by the URI

60 24 January 2012Kaiser: COMS E Gateway Receiving agent Acts as a layer above some other server(s) and, if necessary, translates the requests to the underlying server's protocol Example: Web mail accessing an IMAP serverIMAP –A URL identifies the mail server, mailbox, password –Converts the HTTP request to an IMAP request, gets the IMAP response, converts it to HTTP response

61 24 January 2012Kaiser: COMS E Tunnel Relay point between two connections without changing the message Looks at the first line of the HTTP message to locate the host to be contacted and accept the request Simply relays bits between the two connection points Does not parse or interpret messages Used when the communication needs to pass through a firewall

62 24 January 2012Kaiser: COMS E Transcoder Modifies data as it passes to clients, e.g., to filter ads, reduce image sizes, compress content Particularly useful for wireless and/or constrained devices –Convert HTML to XHTML MPXHTML MP –Modify content to fit small screen –Convert modality of interaction, e.g., driving directions from displaying text to playing audio

63 24 January 2012Kaiser: COMS E Caching Request/response chain is shortened if one of the participants along the chain has a cached response applicable to request

64 24 January 2012Kaiser: COMS E HTTP 1.1 Caching Support Allows a server to determine caching policies in its response – Expires xx-xx-xx yy:yy:yy.yy – Cache-Control: no-store – don’t cache at all – Cache-Control: no-cache – validate every time or don’t cache – Cache-Control: private – can’t keep in a public cache Secure sessions (https) generally not cached

65 24 January 2012Kaiser: COMS E HTTP 1.1 Chunked Encoding Faster response for dynamically-generated pages or very large pages Allows the beginning of a response to be sent before its total length is known Each chunk is prefixed by its size in bytes A zero size chunk indicates the end of the response message If a server is using chunked encoding it must set the Transfer-Encoding header to "chunked"

66 24 January 2012Kaiser: COMS E Summary Clients (browsers) often implement many schemes Technically, only http scheme is World Wide Web But many of the more recent schemes also associated with the Web Clients do not always talk directly to origin servers indicated in URLs

67 17 January 2012Kaiser: COMS E First Assignment: Paper Proposal Sketch the topic you have in mind Explain why your topic is relevant Include tentative reference list (background reading to learn more about the topic) Some general topic areas suggested at /suggested-topics/, or invent your own /suggested-topics/

68 17 January 2012Kaiser: COMS E First Assignment: “Goal” of Paper Do not simply survey some topic Compare this to that, argue a position in favor or against something, evaluate something according to some criteria, etc.

69 17 January 2012Kaiser: COMS E First Assignment: Background Reading List some specific materials you intend to read to learn about the topic –Scholarly papers from conferences or journals –White papers –Third-party reviews or commentaries (blogs ok) –System documentation –Specifications of "standards" (or proposed standards) –Not advertising or publicity –Not wikipedia Should include materials from at least two different points of view (i.e., do not get all your background information from the same website or the same organization)

70 17 January 2012Kaiser: COMS E First Assignment: Logistics Due Tuesday January 31 st by 10am Two pages (not including optional figures and required reference list) Submit by posting on CourseWorks Must be in a format I can read, which means pdf, word, html, plain ascii text (with all figures embedded or viewable in a browser without special “plugins”) Details at bus/paper-proposal/ bus/paper-proposal/

71 17 January 2012Kaiser: COMS E Upcoming Assignments: Paper Paper outline due Tuesday February 14 th Full paper due Tuesday February 28 th

72 17 January 2012Kaiser: COMS E Heads Up on Project Project Proposal due Tuesday March 6 th Optionally work in teams (see /teaming-advice/) /teaming-advice/ Build a new system or extend an existing system OR evaluate/compare one or more existing system(s) You may "continue" your paper topic towards the project, or do something entirely different

73 17 January 2012Kaiser: COMS E Student Presentations Individual 5-10 minute talk in class One paragraph proposal, also due Tuesday March 6 th May be based on paper, project, or some other topic Do NOT duplicate material presented in lectures! Last year’s presentation slides available at s11/presentations/ s11/presentations/

74 24 January 2012Kaiser: COMS E COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2011


Download ppt "24 January 2012Kaiser: COMS E61251 COMS E6125 Web-enHanced Information Management (WHIM) Prof. Gail Kaiser Spring 2012."

Similar presentations


Ads by Google