Collecting, Analyzing and Using Visitor Data Chapter 12.

Slides:



Advertisements
Similar presentations
Cookies, Sessions. Server Side Includes You can insert the content of one file into another file before the server executes it, with the require() function.
Advertisements

4.01 How Web Pages Work.
SIUG Annual Meeting 2010 UNC Charlotte January 28, 2010 SIUG Annual Meeting 2010 Web Logs: Finally! Now What Do We Do With Them? Dan Pfohl, UNC Wilmington.
© 2006 KDnuggets [16/Nov/2005:16:32: ] "GET /jobs/ HTTP/1.1" "
Dave Krause ANRCS Web Action Team.  Data is collected from a web site based on what the user does during the visit.
XP Browser and Basics1. XP Browser and Basics2 Learn about Web browser software and Web pages The Web is a collection of files that reside.
Layer 7- Application Layer
1 Static Web Pages Websites on Servers (The Big Picture) –Apache Tomcat can support static web pages –Primarily intended to support servlets and JSP –Some.
Internet – Part II. What is the World Wide Web? The World Wide Web is a collection of host machines, which deliver documents, graphics and multi-media.
Session Management A290/A590, Fall /25/2014.
Topics in this presentation: The Web and how it works Difference between Web pages and web sites Web browsers and Web servers HTML purpose and structure.
Browsing the World Wide Web. Spring 2002Computer Networks Applications Browsing Service Allows one to conveniently obtain and display information that.
Browser and Basics Tutorial 1. Learn about Web browser software and Web pages The Web is a collection of files that reside on computers, called.
The Internet & The World Wide Web Notes
E-insights, LLC © 2000 All rights reserved. Understanding Web Traffic Michael Whelan Part 1 of 2.
Evaluating Web Server Log Analysis Tools David Strom SD’98 2/13/98.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
Prof. Vishnuprasad Nagadevara Indian Institute of Management Bangalore
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Visualization of the Webpage Popularity for Ping Wales Visualization of the Popularity of the Web Access for Ping Wales Xiaochuan Huang (George) Supervised.
Dr Lisa Wise 18/10/2002 Website Metrics Dr Lisa Wise.
1 HTML and CGI Scripting CSC8304 – Computing Environments for Bioinformatics - Lecture 10.
Chapter 16 The World Wide Web Chapter Goals Compare and contrast the Internet and the World Wide Web Describe general Web processing Describe several.
INFORMATION TECHNOLOGY IN BUSINESS AND SOCIETY SESSION 7 – THE WEB SEAN J. TAYLOR.
Server tools. Site server tools can be utilised to build, host, track and monitor transactions on a business site. There are a wide range of possibilities.
CP476 Internet Computing Lecture 5 : HTTP, WWW and URL 1 Lecture 5. WWW, HTTP and URL Objective: to review the concepts of WWW to understand how HTTP works.
Chapter 6 The World Wide Web. Web Pages Each page is an interactive multimedia publication It can include: text, graphics, music and videos Pages are.
5 Chapter Five Web Servers. 5 Chapter Objectives Learn about the Microsoft Personal Web Server Software Learn how to improve Web site performance Learn.
COMP 321 Week 7. Overview HTML and HTTP Basics Dynamic Web Content ServletsMVC Tomcat in Eclipse Demonstration Lab 7-1 Introduction.
Tutorial 4: Working with Hyperlinks. Objectives Session 4.1 – Place bookmarks on a Web page – Create a link to a bookmark – Create a link to another Web.
1 Lies, damn lies and Web statistics A brief introduction to using and abusing web statistics Paul Smith, ILRT July 2006.
Web Engineering we define Web Engineering as follows: 1) Web Engineering is the application of systematic and proven approaches (concepts, methods, techniques,
Sustainability: Web Site Statistics Marieke Napier UKOLN University of Bath Bath, BA2 7AY UKOLN is supported by: URL
1 Welcome to CSC 301 Web Programming Charles Frank.
Log files presented to : Sir Adnan presented by: SHAH RUKH.
Bringing It All Together Analyzing Web Server Log Files Eric Landrieu Lead Developer, PerfMan for Web Servers The Information.
EVALUATE YOUR SITE’S PERFORMANCE. Web site statistics Affiliate Sales Figures.
INTERNET. Objectives Explain the origin of the Internet and describe how the Internet works. Explain the difference between the World Wide Web and the.
Chapter 29 World Wide Web & Browsing World Wide Web (WWW) is a distributed hypermedia (hypertext & graphics) on-line repository of information that users.
ECMM6018 Enterprise Networking for Electronic Commerce Tutorial 7
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Web Systems & Technologies Lecture 1
Web Measurement. The Web is Different from other Commuication Media More precise measurement of activity on Web sites is available More precise measurement.
Chapter 1 Introduction to HTML, XHTML, and CSS HTML5 & CSS 7 th Edition.
COMP2322 Lab 2 HTTP Steven Lee Jan. 29, HTTP Hypertext Transfer Protocol Web’s application layer protocol Client/server model – Client (browser):
The Internet, Fourth Edition-- Illustrated 1 The Internet – Illustrated Introductory, Fourth Edition Unit B Understanding Browser Basics.
BTT 10. What is the internet?  A question to all of you…  how-many-people-use-the-internet- more-than-2-billion-infographic/
Session 11: Cookies, Sessions ans Security iNET Academy Open Source Web Development.
1 Chapter 22 World Wide Web (HTTP) Chapter 22 World Wide Web (HTTP) Mi-Jung Choi Dept. of Computer Science and Engineering
Web Design Vocabulary #3. HTML Hypertext Markup Language - The coding scheme used to format text for use on the World Wide Web.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
4.01 How Web Pages Work.
Technologies and Applications
CISC103 Web Development Basics: Web site:
Warm Handshake with Websites, Servers and Web Servers:
COMP2322 Lab 2 HTTP Steven Lee Feb. 8, 2017.
19.10 Using Cookies A cookie is a piece of information that’s stored by a server in a text file on a client’s computer to maintain information about.
IS333D: MULTI-TIER APPLICATION DEVELOPMENT
CISC103 Web Development Basics: Web site:
The Request & Response object
What is Cookie? Cookie is small information stored in text file on user’s hard drive by web server. This information is later used by web browser to retrieve.
Chapter 27 WWW and HTTP.
Web Privacy Chapter 6 – pp 125 – /12/9 Y K Choi.
Web Page Concept and Design :
4.01 How Web Pages Work.
Information Retrieval and Web Design
4.01 How Web Pages Work.
Presentation transcript:

Collecting, Analyzing and Using Visitor Data Chapter 12

Web Mining Web-content mining: Deals with the content of web documents Web-structure mining: Concerned with the “topology” and the use of hyperlinks that connect one page to another Web-usage mining: Secondary data generated by user interactions with the website Chapter 12: Collecting, Analyzing and Using Visitor Data 2

Data in Web-server Access Logs The IP address of the client making the request The date and time of the request The URL of the requested page The number of bytes sent to serve the request The user agent (the program that is acting on behalf of the user, such as a web browser or web crawler) The referrer (the URL that triggered the request) Chapter 12: Collecting, Analyzing and Using Visitor Data 3

Common Log Format Chapter 12: Collecting, Analyzing and Using Visitor Data 4

Common Log Format: Examples pawan [06/Sep/2001:10:46: ] "GET /s.htm HTTP/1.0" A GET request that retrieves a file named s.htm From a computer with the IP address of A dash (-) tells us that the information is unavailable raj [06/Sep/2001:11:23: ] "POST /s.cgi HTTP/1.0" A POST request that sends data to the program s.cgi. Chapter 12: Collecting, Analyzing and Using Visitor Data 5

A Log File in Extended Format #Version: 1.0 #Date: 12-Jan-1996 #Fields: time cs-method cs-uri 00:34:23 GET /foo/bar.html 12:21:16 GET /foo/bar.html 12:45:52 GET /foo/bar.html 12:57:34 GET /foo/bar.html Chapter 12: Collecting, Analyzing and Using Visitor Data 6

Extended Log File: Directive Types Chapter 12: Collecting, Analyzing and Using Visitor Data 7

Extended Log File: Identifier Prefixes Chapter 12: Collecting, Analyzing and Using Visitor Data 8

Extended Log File: Mandatory Identifiers Chapter 12: Collecting, Analyzing and Using Visitor Data 9

Extended Log File: Identifiers with No Prefixes Chapter 12: Collecting, Analyzing and Using Visitor Data 10

Apache Web-server Access Log Entries LogFormat directive is used to specify the selection of fields in each entry The format uses a string styled after the printf format strings in the C programming language The Common Log Format entry pawan [06/Sep/2001:10:46: ] "GET /s.htm HTTP/1.0" can be represented using the following LogFile directive: LogFormat "\%h \%l \%u \%t \"\%r\" \%>s \%b" common Chapter 12: Collecting, Analyzing and Using Visitor Data 11

Apache Common Log: Parameters Chapter 12: Collecting, Analyzing and Using Visitor Data 12

Web Access Log Analyzers (1 of 2) Chapter 12: Collecting, Analyzing and Using Visitor Data 13

Web Access Log Analyzers (2 of 2) Chapter 12: Collecting, Analyzing and Using Visitor Data 14

Analog: Summarizing Web-server Access Logs Chapter 12: Collecting, Analyzing and Using Visitor Data 15

General Summary from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 16

Monthly Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 17

Daily Summary from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 18

Hourly Summary from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 19

Domain Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 20

Organization Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 21

Search-word Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 22

Operating-system Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 23

Status-code Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 24

File-size Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 25

File-type Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 26

Directory Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 27

Request Report from Analog Chapter 12: Collecting, Analyzing and Using Visitor Data 28

Clickstream with Pathalizer: 7-link Chapter 12: Collecting, Analyzing and Using Visitor Data 29

Clickstream with Pathalizer: 20-link Chapter 12: Collecting, Analyzing and Using Visitor Data 30

StatViz: On-campus Session that Browses the Bulletin Board Chapter 12: Collecting, Analyzing and Using Visitor Data 31

StatViz: Off-campus Session with Three Distinct Activities Chapter 12: Collecting, Analyzing and Using Visitor Data 32

StatViz: On-campus Session with Multiple Activities Chapter 12: Collecting, Analyzing and Using Visitor Data 33

Caution: Interpreting Web-server Access Logs (Turner 2004) You do not really know any of the following: The identity of your readers The number of your visitors The number of visits The user’s navigation path through the site The entry point and referral How users left the site or where they went next How long people spent reading each page How long people spent on the site Chapter 12: Collecting, Analyzing and Using Visitor Data 34

Nevertheless … (Turner 2004) I’ve presented a somewhat negative view here, emphasizing what you can’t find out. Web statistics are still informative: it's just important not to slip from “this page has received 30,000 requests” to “30,000 people have read this page”. In some sense these problems are not really new to the web---they are just as prevalent in print media. For example, you only know how many magazines you've sold, not how many people have read them. In print media we have learnt to live with these issues, using the data which are available, and it would be better if we did on the Web too, rather than making up spurious numbers. Chapter 12: Collecting, Analyzing and Using Visitor Data 35