Presentation is loading. Please wait.

Presentation is loading. Please wait.

Internet / Intranet Fall 2000 Class 4 Web Server Technology HTTP Protocol Log Files.

Similar presentations


Presentation on theme: "Internet / Intranet Fall 2000 Class 4 Web Server Technology HTTP Protocol Log Files."— Presentation transcript:

1 Internet / Intranet Fall 2000 Class 4 Web Server Technology HTTP Protocol Log Files

2 Brandeis University Internet/Intranet Spring 2000 2 Class 4 Agenda Discuss Homework Milestone 2 Due Week 6 Mini-Homework Due Next Week Overview of Web Servers and Server Technology Presentations HTTP The Protocol For Communication Between Web Browser and Server Log Files Lab Work HTTP Log Files (Mini-Homework)

3 Brandeis University Internet/Intranet Spring 2000 3 Web Servers A Basic Web Server is Just a File Server Client Requests a File via HTTP Protocol Server Delivers the File via HTTP Protocol Server Maps URL to a Subdirectory Web Server Needs Appropriate Permissions to Access Files/Directories Supports Non-HTTP Protocols FTP, Gopher, etc. A Web Server is Not HTML Specific Typically Identifies a Filetype by Extension Or Directory Where File Exists

4 Brandeis University Internet/Intranet Spring 2000 4 Additional Common Web Server Features Additional Security Beyond That Provided by O/S Scripting Ability to Dynamically Create a Web Page Run a Program Instead of Returning a File (CGI) Return the Program Output as the Requested File Administration Log Files Performance Monitoring

5 Brandeis University Internet/Intranet Spring 2000 5 Advanced Web Server Features Virtual Hosting Allow Multiple URL’s to Map to Same Computer Performance Optimization Caching Reliability Scalability Proxy Servers (For Security and Performance) Fetch Documents That are on Other Computers Cache Them Locally Allows for Easy Scalability Multiple Proxy Servers Can Cache Documents From One Source Computer Embedded Scripting Server Side Includes Custom Scripting Languages Server API

6 Brandeis University Internet/Intranet Spring 2000 6 Web Servers – Added Functionality Database Connectivity SQL, MySQL Directory Listings Icons, etc. Built-In Search Engines Built-In ImageMap Handling Multimedia Support Session Emulation Streaming Multimedia Advanced Security Encrypted HTTP S-HTTP (Secure HTTP) – CommerceNet SSL (Secure Sockets Layer) - Netscape Web Server “Add-Ons” CGI Substitutes / CGI Optimizations Cold Fusion

7 Brandeis University Internet/Intranet Spring 2000 7 Web Server History All Web Servers Have a Common Root httpd (NCSA) UNIX Orientation Many Features are Essentially UNIX Features Apache Website (O’Reilly) Netscape Enterprise Server Microsoft Internet Information Server A Slew of Others

8 Brandeis University Internet/Intranet Spring 2000 8 Apache UNIX Origins – Now Ported to NT Evolved From httpd Freeware Typical UNIX Application Public Source Code Many Defaults, Conventions BUT: All is Configurable No GUI Interface Configured via Scripts, Shell Commands, Config Files Various “Flavors” Many Optional Features API ApacheSSL

9 Brandeis University Internet/Intranet Spring 2000 9 IIS / Netscape Microsoft IIS Not Strictly Derived From httpd/Apache Windows NT However: Functionally Very Similar to Apache Emulates Many UNIX Conventions E.g. Forward Slashes Configuration via GUI Personal Web Server Peer Web Server Netscape Multi-Platform UNIX is Preferred Platform Less “Open” Than Apache More Secure?

10 Brandeis University Internet/Intranet Spring 2000 10 UNIX File Structure Forward Slashes (/) to Separate Filenames, Directories Case Sensitive File Names Windows is Not No Limit on Filename Size / Extensions Extensions are by Convention Root is “/” User Home Directory is: “~/” Symbolic Links / Aliases Directories Can Be Spread Over Multiple Drives Can Create Non-Hierarchical Structure File Permissions Read, Write, Execute Separate Permissions for Owner, Group, All Directories are Special Cases of Files Execute Permissions = Able to Browse Directory

11 Brandeis University Internet/Intranet Spring 2000 11 Web Server Configuration Directory Structure Virtual Document Tree Access to User Directories UNIX: ~user Symbolic Links Be Careful: May Link You Out of Directory Structure Case Sensitivity Ownership Access Server is a Process Started by a User. Has the Permissions of the User Who Started It. Default Documents Allow Directory Browsing Scripting Who is Allowed to Run Scripts? How are Scripts Identified?

12 Brandeis University Internet/Intranet Spring 2000 12 Web Server File Access Control / Security Directory O/S Level Security IP, Domain Level Security Spoofing Directory Access.htaccess Microsoft Front-Page Extensions Encryption S-HTTP Web Protocols Only SSL TCP/IP Level V1.0 – V2.X : Security Holes Found, Fixed V3.0 Is Current Uses Port 443 Microsoft PCT Response to Holes in SSL 2.0 Now Use SSL

13 Brandeis University Internet/Intranet Spring 2000 13 Server Administration Need Sysadmin and O/S Expertise Lots of “Holes” Gotchas Whenever Scripts are Allowed FTP Who is Allowed to Change Documents? Who is Allowed to Change Server Configuration? How do They Get Access? Direct Access Remote Access (e.g. FTP) Log Files Accessibility Directory Structure Management

14 Brandeis University Internet/Intranet Spring 2000 14 HTTP The Protocol For Requesting and Delivering Web Pages Not Restricted to Returning HTML Files Client Server Model Request / Reponse TCP/IP Protocol Using Port 80 Supports Other Ports, Can Be Run Over Other Protocols “Replaced” FTP as the Primary Method For Internet File Transfer Stateless Uses MIME Format to Encapsulate Data Message Structure Similar to SMTP Mail Messages Message Header (metadata) Message Body (data) Separated From Header by a Blank Line Browser Only Displays Body, Not Header No Restrictions on Message Size / Format (as with SMTP)

15 Brandeis University Internet/Intranet Spring 2000 15 HTTP Versions HTTP 1.0 - Commonly Used Version HTTP 1.1 Formalizes Many Extensions to Version 1.0 Supports Persistent Connections Supports Compression/Decompression Supports Virtual Hosting Single Server With Multiple IP Addresses Supports Multiple Languages Supports Byte Range Transfers Useful For Re-Sending Interrupted Data Transfers Similar to Process Used By XMODEM, etc.

16 Brandeis University Internet/Intranet Spring 2000 16 HTTP OVERVIEW Client (Browser) Web Server File System HTTP Request HTTP Response HTML Server Application HTML CGI

17 Brandeis University Internet/Intranet Spring 2000 17 HTTP Commands Simple Structure Main Methods GET HTTP/1.0 Request the File Specified By the URL URI is URL Without Protocol/Port HEAD Request the HTTP Header Information Only Don’t Return the File Itself POST Sends Data to The Server Typically Data From a Form Defined, But Not Widely Implemented PUT DELETE LINK UNLINK

18 Brandeis University Internet/Intranet Spring 2000 18 Common HTTP Header Fields Additional “Parameters” to the HTTP Commands Used in HTTP Requests: Accept Lists the MIME Types That Client Can Accept E.g. Accept text/plain, text/html or Accept * Accept-Charset Lists Accepted Character Sets That Client Can Accept ASCII, ISO-8859-1 Are Assumed Accept-Encoding Accept-Language Authorization Basic – UserName:Password (Base64 Encoding) Cookie From E-mail Address of Requesting User Not Typically Used For Privacy Reasons Primarily Used By Automated Clients (e.g. Bots)

19 Brandeis University Internet/Intranet Spring 2000 19 Common HTTP Header Fields (2) Host Virtual Host – One Server Handles Multiple Sites If-Modified-Since Only Return Data if it Has Been Modified Since This Date Pragma General Purpose For “Additional” Headers Not in Standard Referrer The URL That Referred One to This URL User-Agent Name/Version of the HTTP Client Used in HTTP Responses: Allow Lists the Available Commands Supported by Server Content-Encoding Allows for Passing Data in Compressed Formats Content-Language Describes the Natural Language of the Intended Audience

20 Brandeis University Internet/Intranet Spring 2000 20 Common HTTP Header Fields (3) Content-Length Size of the Message Body Content-Type The MIME Type For the Data Date Expires HTTP Clients Should Not Cache Data After This Date Last-Modified Location Used For Redirection MIME-Version Pragma E.g. no-cache Retry-After When Server is Unavailable. Info On When to Try Back Server Name/Version of the HTTP Server

21 Brandeis University Internet/Intranet Spring 2000 21 Common HTTP Header Fields (4) Title Descriptive Title of the File WWW-Authenticate When Authorization Denied, Tells Client Which Methods of Authentication are Supported HTTP Status Codes Returned By the Server In First Line of Response Informational (100-199) Successful (200-299) Redirection (300-399) Location in HTTP Header Specifies Redirection Client Error (400-499) Server Error (500-599)

22 Brandeis University Internet/Intranet Spring 2000 22 Common Status Values 200 – OK 201 – Created (Post Request Was Fulfilled) 204 - No Content (OK. Nothing For Client to Display 300 - Multiple Choices Requested Resource Available From Multiple Locations. List of Locations Returned in the Response. 301 - Moved Permanently 302 - Moved Temporarily 304 - Not Modified Document Hasn’t Been Modified Since If-Modified Since Date 400 - Bad Request 401 – Unauthorized 403 - Forbidden 404 – Not Found 500 – Internal Server Error 501 – Not Implemented (Server Does Not Support ThisRequest) 502 – Bad Gateway (Invalid Response From Server) 503 – Service Unavailable

23 Brandeis University Internet/Intranet Spring 2000 23 Cookies Cookies Are Name Value Pairs Stored by the Client Passed in the HTTP Header Cookies Have Associated Expiration Session (Default) Date / Time Associated With a URL Path, Not a Page! Allows Passing Parameters Between Web Pages Thus Cookies are Used to Provide State Information to a Stateless Protocol

24 Brandeis University Internet/Intranet Spring 2000 24 Web Server HTTP Functionality Content Negotiation Choose From Several Different Formats Based on Request Language Negotiation Choose From Versions of Same Document Based on Request Support for HTTP-Put, HTTP-Delete Keep-Alive As-Is Server Doesn’t Add HTTP Headers Allows You to Create Specific Behavior Redirect to Another Site Never Saved in Browser’s Cache

25 Brandeis University Internet/Intranet Spring 2000 25 Class Exercise: HTTP http://www.mkat.com/brandeis/httplist.cfm Viewhttp.exe

26 Brandeis University Internet/Intranet Spring 2000 26 Server Log Files Records Server Activity

27 Brandeis University Internet/Intranet Spring 2000 27 Some Definitions Hits Each HTTP Request is a Hit Accessing a Web Page May Result in Multiple Hits E.g. Each Graphic is a Hit Page Views Accessing a Single Web Page is a Page View E.g. Typing in a URL or Clicking on a Link Visits A Single Client’s Visit to Your Entire Site (Session) May Include Multiple Page Views What Constitutes a Second Visit From the Same Client? Why is This Important? Terms are Sometimes Used Interchangeably and Improperly Compare Apples to Apples Important for Commercial Web Sites Advertising is Based on Site Access Typically Sold on Page View Basis

28 Brandeis University Internet/Intranet Spring 2000 28 Server Log Files Many Variations to Web Server Log File Formats Four Log Files Access (Transfer) Log Each Hit is Recorded User, Date/Time, HTTP Request, etc. Error Log Date/Time, Error Referrer Log Referring Page, Destination Page Agent (User) Log Client’s Browser Clearly a Need for Standardization Linking the Four Log Files Together

29 Brandeis University Internet/Intranet Spring 2000 29 Common Log Format Host IP Address (or Hostname) of Client Some Servers Perform Lookup of IP Address RFC931 HTTP Request: From Seldom Used. Authuser HTTP Request: Authorization UserName if Username Authorization is Required Time Stamp HTTP Response: Date E.g. [ 10/Jun/1998:14:23:34 -0700] Request The Actual HTTP Request E.g. GET /index.htm HTTP/1.1

30 Brandeis University Internet/Intranet Spring 2000 30 Common Log Format (2) Status The HTTP Response Status Code Transfer Volume HTTP Response: Content-Length

31 Brandeis University Internet/Intranet Spring 2000 31 Extended Log File Format Seven Common Log Format Fields Plus Referrer HTTP Request: Referrer User Agent HTTP Request: User-Agent Identifies Browser Other Common Fields Cookies Can Help Identify Users

32 Brandeis University Internet/Intranet Spring 2000 32 Issues Client vs. User Typically Don’t Have User Level Information Only Record IP Address of Computer Used For Access If Fixed IP Address For a Single User’s Machine This Can Identify the User Dynamically Assigned IP Addresses Identifies the Overall Domain (e.g. AOL.com) Proxy Servers All Client’s Have IP Address of Proxy Server Multiple “Sessions” at Same Time Impossible to Have Truly Accurate Information Log File Analysis Software Has Algorithms to Identify Page Views, Visits Client Level Caching Affects Logs “ISP” Level Caching Affects Logs E.g. AOL Maintains a Cache No Requirement for Clients, ISPs to Follow Expiration Info

33 Brandeis University Internet/Intranet Spring 2000 33 Log File Maintenance on Server Log Files Grow Rapidly Log Files Compress Very Nicely Server Configurable Generate Daily/Weekly/Monthly Logs Maintenance Scripts to Cleanup Log Files Compress Archive Cycle E.g. Maintain Current Months Files

34 Brandeis University Internet/Intranet Spring 2000 34 Log File Analysis Big Business Bread and Butter of Sites Driven By Advertising Revenue Evaluation Factors Log File Formats Supported Ability to Link Multiple Logs How Log Files are Accessed (e.g. via FTP) Display Methodology E.g. Available Via Web Pages Lookup Capabilities E.g. Map User-Agent to Browser E.g. Resolve IP Addresses to Domains, Regions Level of Analysis E.g. Calculating Visits, Return Visitors Configurability Drill-Down Capabilities Enterprise Capabilities Ability to Manage Multiple Sites

35 Brandeis University Internet/Intranet Spring 2000 35 Log File Analysis Options Important to Understand the Core Log Files Log File Analysis Programs Make Some Assumptions Freeware Commercial Service Bureaus

36 Brandeis University Internet/Intranet Spring 2000 36 In Class Exercise / Mini Homework Download http://www.mkat.com/brandeis/sample.log View in Text Editor Load Into Excel Delimited / Spaces Review the Log File in Detail (Do Not Use Analysis Tools) Describe What You Can Learn From the Log File Add it To Your Homepage along With In Class Exercises Due Next Week

37 Brandeis University Internet/Intranet Spring 2000 37 Resources HTTP Stein pp. 47-57 Server Comparison http://webcompare.internet.com/chart.htm Apache Server www.apache.org Website Server http://website.ora.com Microsoft IIS http://www.microsoft.com/NTWorkstation/downloa ds/Recommended/ServicePacks/NT4OptPk/Default. asp


Download ppt "Internet / Intranet Fall 2000 Class 4 Web Server Technology HTTP Protocol Log Files."

Similar presentations


Ads by Google