Presentation is loading. Please wait.

Presentation is loading. Please wait.

The HTTP Protocol && Web Server. What Does the WWW Server Do? Enables browser requests Provides –Support for retrieving hypertext documents –Manages access.

Similar presentations


Presentation on theme: "The HTTP Protocol && Web Server. What Does the WWW Server Do? Enables browser requests Provides –Support for retrieving hypertext documents –Manages access."— Presentation transcript:

1 The HTTP Protocol && Web Server

2 What Does the WWW Server Do? Enables browser requests Provides –Support for retrieving hypertext documents –Manages access to the Web site –Provides several mechanisms for executing server-side scripts Common Gateway Interface (CGI) Application Programmers Interface (API) –Produces log files and usage statistics

3 How Does a Web Server Communicate? Web browsers and Web servers communicate using a protocol known as Hypertext Transfer Protocol (HTTP) Current HTTP protocol is version 1.1 –HTTP/1.0, as defined by RFC 1945, allows messages to be in the format of MIME-like messages, containing meta-information about the data transferred and modifiers on the request/response semantics. –HTTP/1.1 extends the protocol to handle the effects of hierarchical proxies caching the need for persistent connections virtual hosts

4 A Graphical View

5 An HTTP Scenario Communication takes place over a TCP/IP connection, generally on port 80

6 Persistent Connections In the original HTTP protocol each request was made over a new connection –so an HTML page with n distinct graphic elements produced n+1 requests TCP uses a three-way handshake when establishing a connection, so there is significant latency in establishing a connection –client sends SYN, server replies ACK/SYN, client responds with ACK HTTP1.0 introduced a keep-alive feature –the connection between client and server is maintained for a period of time allowing for multiple requests and responses

7 HTTP/1.0 Keep Alive Connections Open connection Send 1st request Receive 1st response Send 2nd request Receive 2nd response etc Close connection Acknowledge connection Receive request send response receive request send response etc Close connection clientserver

8 MIME MEDIA TYPES HTTP sends the media type of the file using the Content- Type: header Some important media types are –text/plain, text/html –image/gif, image/jpeg –audio/basic, audio/wav –model/vrml –video/mpeg, video/quicktime –application/*, application-specific data that does not fall under any other MIME category, e.g. application/octet-stream

9 Multipurpose Internet Mail Extensions MIME is an Internet standard for electronic mail –Traditional e-mail was limited to ASCII text, limited line length, and limited size MIME has extended Internet e-mail to include –Unlimited text line and message length –Messages with multiple objects enclosed –Messages that point to files on another server and are automatically retrievable –International character sets in addition to US-ASCII –Formatted text including multiple font styles –Images, video clips, audio messages

10 Client HTTP Requests General form of a request has four fields: HTTP_method, identifier, HTTP_version, Body –HTTP_Method says what is to be done to the object specified in the URL; some possibilities include GET, HEAD, and POST –identifier is the URL of the resource or the body –HTTP_versionis the current HTTP version, e.g. HTTP/1.1 –Body is optional text

11 HTTP Request Methods GET, retrieve whatever information is identified by the request URL HEAD,identical to GET, except the server does not return the body in the response POST, instructs the server that the request includes a block of data, which is typically used as input to a server-side application PUT,used to modify existing resources or create new ones DELETE, used to remove existing resources TRACE, traces the requests in a chain of web proxy servers

12 Example of a GET Request A GET request asks the server to retrieve a document click here The request from the client may contain the following lines GET /html/file.html HTTP/1.1 Accept: text/plain Accept: application/pdf Accept: application/html Accept: text/x-html Accept: audio/*... Accept: */* User-Agent: NCSA Mosaic for the X Window System/2.4 Referer: http://tlaloc.sfsu.edu/html/prevfile.html If-Modified-Since: Wed, 11 Feb 1998 13:14:15 GMT {there is a blank line here which terminates the input}

13 Response of the Server to GET In response to the previous client request, the server responds with the following HTTP/1.1 200 OK Date: Monday, 29-May-98 12:02:12 GMT Server: Apache/1.4 MIME-version: 1.0 Content-Type: text/html Last-modified: Sun, 28-May-98 15:36:13 GMT Content-Length: 145 {a blank line goes here } {the contents of file.html goes here }

14 HTTP Headers HTTP/1.1 defines 46 headers divided into four categories: –general, present in requests or responses –request, present only in requests –response, present only in responses –entity, describe the content of a body

15 Examples of HTTP Headers - General Connection: close used to prevent persistent connections Date: Wed, 30 Dec 1998 23:56:32 date and time message was generated Transfer-Encoding: chunked indicates if a transformation has been made on the message Upgrade: HTTP/2.0 causes protocol to switch dynamically Via: 1.1 myproxy, 1.1 second-proxy indicates a proxy chain that the request was passed through

16 Examples of HTTP Headers - Request Accept: text/html, image/* indicates what media types are acceptable Accept-Charset: iso-8859-5 indicates acceptable character sets. By default all are acceptable Accept-Encoding: compress, gzip indicates acceptable encodings Accept-Language: en, fr=0.5 indicates language preferences, English preferred, but French also accepted Authorization: used to pass user’s credentials to the server

17 Examples of HTTP Headers - Request From: name@site.com requesting user’s email address, rarely present Host: www.sfsu.edu:8080 hostname and port of the requesting URL Referer: http://www.sfsu.edu/index.html the URL of the doucment that contains the reference to the requested URL User-agent: Mozilla/4.5 (X11;Solaris 5.1) reports the client software name and version and possibly platform

18 Examples of HTTP Headers - Response The response-header fields allow the server to pass additional information about the response which cannot be placed in the Status-Line These header fields give information about –the server –about further access to the resource identified by the Request-URI. response-headers include: Accept-Ranges | Age | ETag | Location | Proxy-Authenticate | Retry-After | Server | Vary | Warning | WWW-Authenticate

19 Examples of HTTP Headers - Response Age: 1246 Location: http://www.myco.com/page.html Public: GET, HEAD, POST, OPTIONS, PUT Server: Apache/1.3.1 WWW-AUTHENTICATE: sent with 401 Unauthorized status code, it includes authorization parameters Retry-after: 240 used with Service Unavailable status, indicates requested data will be available in 4 minutes

20 Examples of HTTP Headers - Response A URL may point to a document with multiple representations: languages, formats (html, pdf), or html features based upon user- agent e.g. the request is GET http://www.myco.com/ HTTP/1.1 User-agent: Mozilla/4.5 Accept-language: en the response is HTTP/1.1 200 OK Vary: Accept-language Content-type: text/html Content-language: en The proxy must store the fact that this doc has variants and when requested, get the proper variant

21 Examples of HTTP Headers - Entity Content-Base: http://www.sfsu.edu/somedir Content-Encoding: gzip Content-Language: en Content-Length: 7890 Content-type:text/html Expires: Wed, 30 Dec 1998 03:43:21 GMT specifies the expiration date/time of the object; a cached copy should not be used beyond; Expires 0/now is immediate Last-Modified: Wed, 30 Dec 1998 01:20:34 GMT specifies the creation or last modification time of the object on the web server

22 HTTP Status Codes - Informational After receiving and interpreting a request message, a server responds with an HTTP response message. Syntax of response is Status-Line *( general-header | response-header | entity-header ) CRLF [ message-body ] where the Status line is composed of Status-Line = HTTP-Version Status-Code Reason-Phrase CRLF

23 HTTP Status Codes 101Switching Protocols, switching either the version or the actual protocol (Informative) 200OK, request succeeded (Successful) 201Created, result is newly created (Successful) 301Moved permanently, new location is specified in Location: header (Redirection) 304Not Modified: the client or proxy copy is still up-to-date (Redirection) 400Bad request, server could not understand (Client error) 401unauthorized, authorization challenge (Client error) 403forbidden, server refuses to fulfill request; e.g. check protections (Client error) 404Not found, document does not exist (Client error) 500Internal server error, generic error code for the server 501Not implemented, request could not be serviced because server does not support it (Server error)

24 Simple Proxy Server Show sample software and do several examples On the class web page there is a simple proxy server built by Dan Boneh of Stanford See the Special Tools category start the proxy server on a specific port, e.g. 5678 Point your browser to the proxy server telnet to the proxy server issue commands to your browser and watch as the proxy server prints out the request/response messages

25 What to Look for in a Web Server Main features –platform they run on; –complete support for HTTP 1.1 –Multithreaded, load balancing –Server side includes image mapping, analysis tools Security –IP address restriction –Domain name restriction –Secure transactions: SSL or S_HTTP –Proxy server

26 How Servers Handle Multiple Requests For each request, a complete copy of the server is made and executed –Initially a parent process waits for requests; for each new request a child process is spawned Or, a single server program handles many requests simultaneously (multithreaded) –the server must keep track of all requests and switch between them as needed –writing multithreaded programs is easier in a language that supports multiple threads, e.g. Java

27 Some (free or commercially) Available Servers Some freeware servers include –Apache for UNIX, NT, http://www.apache.org –webstar for Mac http://www.biap.com Publisher Platform URL Netra ( Sun MicroSystems) Sparcstations/Solaris www.sun.com Webforce (Silicon Graphics) Indigo/MIPS/IRIX www.sgi.com Internet connection (IBM) Intel/OS2 www.ibm.com IIS (Microsoft) Intel, Others/W95//NT www.microsoft.com Enterprise (Netscape) Intel/WinNT www.netscape.com Website (O’Reilly) Intel/Win95/WinNTwww.ora.com Webserver (SPRY) Intel/Win95/WinNT www.spry.com

28 Server Usage http://www.netcraft.co.uk has studied more than 3,000,000 sites –UNIX currently dominates Internet servers – Windows NT has a small, but growing share Statistics on intranets are difficult to determine

29 Server Features Document Root Server Side Includes Authentication Proxy Servers Caching CGI Scripting Application Programmer’s Interface

30 Document Tree /serverhome other htdocs usr public home.html eecs page.html index.htmlimages banner.gif index.html The document root is /serverhome/htdocs http://domain/cs is mapped to /serverhome/htdocs/cs/index.html index.html

31 Document Types File extensions indicate the type of document,.html,.htmhtml document.txtunformatted ASCII.pspostscript.gifGIF image.jpegJPEG image.wrl,.vrmlVRML scene Some documents are doubly encoded, e.g. a compressed postscript file. Since there is no file suffix to describe this, the web server uses MIME content headers, e.g. Content-type: application/postscript Content-encoding: gzip

32 Server-Side “Includes” A server-side “include” is an action taken by the server before it delivers the document to the client Unfortunately, this feature requires the server to examine and “parse” the document, looking for the special commands –Most servers supporting this feature come with it disabled as the default –Typically, parsable HTML files are denoted by a different suffix; e.g., shtml General form:

33 Some Server-Side “Include” Directives CommandTagExplanation configerrmsgControls what messages are sent back to the client if an error occurs timefmtProvides a new format for expressing dates sizefmtProvides a format for expressing file sizes includefileGives a pathname relative to current directory echovarPrints the value of an environment variable Values for var tag are: DOCUMENT_NAMEThe current filename DATE_LOCALCurrent date using local time zone DATE_GMT Current date using Greenwich Mean Time LAST_MODIFIEDLast date/time file was touched fsizefile Prints size of the file flastmodfile Prints last modification date of the file execcmdExecutes a shell command or CGI script

34 Example of Server-Side “Includes” example of server includes Examples of Server Side Includes Local Date: Date (GMT): Last Modified On: File Size:

35 Browser Output of Server-Side “Includes”

36 Basic User Authentication Basic authentication is supported by all HTTP servers –The server administrator creates secure directories accessible via password files maintained by the server –Client requests a resource from a secure directory; e.g., GET /secure/test.html HTTP/1.0 –Server responds with authentication request to the client; e.g., HTTP/1.0 401 Unauthorized –Browser requests user name and password, scrambles them, and retries the request GET http://domain/secure/test.html HTTP/1.0 Authorization: Basic 0<V%L:EB.G-E8W))$J –Server decodes name and password and checks it against its password file

37 Web Servers as Proxy Servers A proxy is a forwarding agent, receiving requests for a URI in its absolute form, rewriting all or part of the message, and forwarding the reformatted request toward the server identified by the URI A gateway is a receiving agent, acting as a layer above some other server(s) and, if necessary, translating the requests to the underlying server’s protocol Any party to the communication between browser and final server may employ an internal cache for handling requests. The effect is to shorten the request/response chain Browser Proxy ServerWeb Server

38 Caching The goal of caching is to eliminate the need to send requests (reducing the number of network round trips) or eliminate the need to send full responses (reducing network bandwidth) The Cache-Control header field contains directives to the caching mechanism, –e.g.some request directives are: no-cachedo NOT cache max-ageclient will accept a response whose age is no greater than specified –some response directives are: publiccachable by any cache privatedo NOT cache

39 Caching client Proxy server Internet Web server We configure the browser so all requests are sent to the proxy Client to Proxy: GET http://domain/file.html HTTP/1.1 docs domain

40 Creating Server-Side Applications Servers offer two mechanisms –Application Programming Interface (API) –Common Gateway Interface (CGI) For Windows/NT IIS API applications are compiled as Dynamic Link Libraries (DLLs) and loaded by the IIS at startup API applications are efficient as no new process has to be started

41 CGI Scripting A CGI application is a separate executable process invoked by the web server to handle a specific client request Immediately before invoking the CGI program (called a script), the web server creates a set of environment variables which can be used by the invoking program Since the CGI scripts runs as a separate process, it cannot crash the web server. If the program fails, the web server eventually times out

42 Configuring a Server No matter which operating system or server, you will need to define the –location of the server (server root) –location of documents to deliver (document root) –location of CGI scripts to execute You may also wish to define –Access restrictions –Fancy indexing –server-side includes –Other features

43 Apache Web Server Apache is based on code and ideas developed at NCSA in httpd version 1.3 (early 1995)Apache is based on code and ideas developed at NCSA in httpd version 1.3 (early 1995) The name “Apache” comes from the fact that it is A set of PAtCHEs to NCSA’s httpdThe name “Apache” comes from the fact that it is A set of PAtCHEs to NCSA’s httpd Apache is free and is downloadable from http://www.apache.orgApache is free and is downloadable from http://www.apache.org

44 Apache Home Page Example of open source software Available for many platforms Windows95/NT Solaris, linux, etc.

45 How does Apache Work? Apache runs under a multitasking operating system, e.g. UNIX, OS/2, NT the binary is called httpd Key directories include: –confcontains configuration files –htdocscontains html files –logs contains logging data –cgi-bin contains executable scripts –iconsset of *.gif files used e.g. in directory listings –src Apache idles, listening to an IP address and port The webmaster controls apache through configuration file settings

46 Apache Settings - httpd.conf Directives are keywords followed by their value,e.g. Port 9637 ServerAdmin yoon@cs.sfsu.edu ServerRoot /home/csc667/WebServer/apache_1.3.4 ErrorLog logs/error_log the above are ones that must be set

47 Apache Settings - httpd.conf # KeepAlive: Whether or not to allow persistent connections (more than one request per connection). Set to "Off" to deactivate. KeepAlive On # MaxKeepAliveRequests: The maximum number of requests to allow during a persistent connection. Set to 0 to allow an unlimited amount. They recomend you leave this number high, for maximum performance. MaxKeepAliveRequests 100 # KeepAliveTimeout: Number of seconds to wait for the next request KeepAliveTimeout 15

48 Apache Settings - httpd.conf # Server-pool size regulation. Apache dynamically adapts to the load it sees --- it tries to maintain enough server processes to handle the current load, plus a few spare servers to handle transient load spikes. It does this by periodically checking how many servers are waiting for a request. If there are fewer than MinSpareServers, it creates a new spare. If there are more than MaxSpareServers, some of the spares die off. MinSpareServers 2-ISD limit MaxSpareServers 5 -ISD limit # StartServers is number of servers to start --- StartServers 2-ISD limit

49 Apache Settings - srm.conf # DocumentRoot: The directory out of which you will serve your documents. DocumentRoot /home/cscixxx/WebServer/apache/htdocs # UserDir: The name of the directory which is appended onto a user's home directory if a ~user request is received. UserDir public_html # DirectoryIndex: Name of the file or files to use as a pre-written HTML DirectoryIndex index.html

50 Apache Settings - Scripting # ScriptAlias: This controls which directories contain server scripts. Format: ScriptAlias fakename realname ScriptAlias /cgi-bin/ /home/cscixxx/WebServer/apache/cgi-bin/ # If you want to use server side includes, or CGI outside ScriptAliased directories, uncomment the following lines. # AddType allows you to tweak mime.types without actually editing it, or to make certain files to be certain types. Format: AddType type/subtype ext1 # To use CGI scripts: #AddHandler cgi-script.cgi # To use server-parsed HTML files #AddType text/html.shtml #AddHandler server-parsed.shtml

51 Authentication in Apache there are two methods for controlling access to directories –the file access.conf in the conf/ directory can be used –per-directory access rules can be set by a file placed in a specific directory the name of the file is set by the directive AccessFileName.htaccess is the default name When the server attempts to retrieve a document, it looks for an access control file in the directory or the parent directory. The file it looks for is set by the directive AccessFileName

52 Limit Sectioning Directive controls which clients can access a directory; directives within LIMIT include: –order in which deny and allow are evaluated –deny from host1, host2,... –allow from host1, host2,... –require named-users or group-users or AuthUserFile –referer allows access only from this directory –satisfy all or any

53 Using LIMIT Host Filtering is used to limit document trees to certain machines Example: to limit access to the cscixxx public_html documents to SFSU only order deny,allow deny from all allow from.sfsu.edu If someone tries to access documents in this directory from outside of sfsu they get a 403 Forbidden message

54 Using LIMIT To limit documents so SFSU people CANNOT access them order allow,deny allow from all deny from from.sfsu.edu

55 Using LIMIT Example of a directive used to limit access to only SFSU and ISI domains order deny, allow deny from all allow from 128.125 allow from 128.9

56 Example Suppose you want to restrict files in a directory called secure/ to the user named student1 and password XXXYYY Step 1. create a file called.htaccess in directory secure/ that contains these 7 lines: AuthUserFile /otherdir/.htpasswd AuthGroupFile /dev/null AuthName ByPassword AuthType Basic require user student1

57 Example Step 2. Next create the password file /otherdir/.htpasswd using the htpasswd program distributed with apache. Type htpasswd -c /otherdir/.htpasswd student1 and enter the password twice. You are done. Step 3. To add other names and passwords run htpasswd without the -c option.

58 Example To generalize this example to include several students, Step 4. create a group file called /otherdir/.htgroup which looks like: my-users: student1 student2 student3 student4 Alter the.htaccess file to look like AuthUserFile /otherdir/.htpasswd AuthGroupFile /otherdir/.htgroup Now it points to the group file AuthName ByPassword AuthType Basic require group my-users

59 Contents of.htaccess The.htaccess file contains pointers to the users file, the group file, and defines the name and type of authorization nunki.usc.edu(10): more.htaccess AuthUserFile /home-scf-03/cscixxx/pwd/.htpasswd AuthGroupFile /home-scf-03/cscixxx/pwd/.htgroup AuthName OnlineCourse AuthType Basic

60 Authentication Directives AuthType specifies the type of authorization control, basic or digest –Digest uses MD5 hashing –If AuthType is used, then AuthName, AuthGroupFile and AuthUserFile must be defined AuthName gives the name of the realm in which the user’s names and passwords are valid. For requests in the realm, the server requests a name and password AuthGroupFilegives the name of a file that contains group names and members AuthUserFile a file of usernames and encrypted passwords Both AuthGroupFile and AuthUserFile should be stored outside the document tree of the web-server; do not put it in the directory that it protects. Otherwise, clients will be able to download it.

61 Using htpasswd This program is included in the Apache distribution as htpasswd.c try htpasswd to get instructions htpasswd [-c] passwordfile username creates a new file called passwordfile and prompts for a password for username

62 Fancy Indexing FancyIndexing displays special icons for different file types Pointer to parent Folders html files compress files

63 Server Side Includes Embed a number of special commands into HTML that the server parses, call the file.shtml or.ssi AddType text/x-server-parsed-html classes.htm the above directive when placed in an.htaccess file alerts the server that the directory contains a file named classes.htm which should be parsed for SSIs

64 Common Log Format Hostclient hostname or IP address identremote identity, often a dash usrauthenticated user name timedate/time request is received reqHTTP request s1server’s HTTP response status code c1Content-length marmot.usc.edu - - [17/Feb/1999:23:21:09 -0800] "GET / HTTP/1.0" 200 1316 128.125.229.195 - - [23/Feb/1999:10:32:09 -0800] "GET / HTTP/1.1" 200 1316

65 Sample error_log Contents Here are two lines from the error_log file [Mon Jan 29 11:47:36 1999] httpd: access to /home/ scf-13/csci665/www/pleader/horohome.html failed for internet-gw.watson.ibm.com, reason: No file matching URL: /pleader/horohome.html from - [Tue Feb 6 13:45:02 1999] httpd: access to /home/ scf-13/csci665/www/cocomo/member.html failed for sunset.usc.edu, reason: No file matching URL: /cocomo/member.html from http://nunki.usc.edu:8080/ cocomo/


Download ppt "The HTTP Protocol && Web Server. What Does the WWW Server Do? Enables browser requests Provides –Support for retrieving hypertext documents –Manages access."

Similar presentations


Ads by Google