Presentation is loading. Please wait.

Presentation is loading. Please wait.

Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.

Similar presentations


Presentation on theme: "Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004."— Presentation transcript:

1 Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004

2 Outline Introduction  What is Web Analytics  Why Web Analytics matter Secondary readings  Log files analysis  Web usage mining  Data preparation  KDD process  Document access in repositories

3 Log File Lowdown (Michael Calore, 2001 ) Log file What are in log file  Traffic  Audience  Browsers/Platforms  Errors  Referers

4 Log File Lowdown Sample Log File adsl-63-183-164.ilm.bellsouth.net - - [09/May/2001:13:42:07 -0700] "GET /about.htm HTTP/1.1" 200 3741 “http://www.e-angelica.com“http://www.e-angelica.com "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" Log File Analyzers  WebTrends, Sawmill, Analog, Webalizer, HTTP-analyze

5 WebTrends log file analyzer Advantages  Fast and effective  User-friendly interface  Feature-rich  Support different operating systems Disadvantages  Not free

6 WebTrends

7 The KDD Process for Extracting Useful Knowledge from Volumes of Data (Fayyad, U., G. Piatetsky-Shapiro, et al. 1996) KDD: Knowledge Discovery in Databases  The value of data  Definitions KDD Data mining

8 The KDD Process The KDD process 1.Creating a target dataset 2.Preprocessing and data cleaning 3.Data reduction and projection 4.Data mining Choosing the data mining function Choosing the data mining algorithm 5.Interpretation and evaluation

9 The KDD Process Data Mining  Data mining involves fitting models to or determining patterns from observed data  Data mining algorithms The model The preference criterion The search algorithm

10 The KDD Process Data Mining  Model functions Classification Regression Clustering Dependency modeling Link anlysis  Goals of Data Mining Predictive and descriptive

11 Data Preparation for Mining World Wide Web Browsing Patterns (Cooley, R. W., B. Mobasher, et al. 1999) Web Usage Mining vs. data mining The WEBMINER process  Preprocessing  Mining algorithms  Pattern Analysis

12 Data Preparation Preprocessing  Data cleaning  User identification  Session identification  Path completion  Formatting

13 Data Preparation

14

15 Tracking the Growth of a Site ( Nielsen, Jakob, 1998 ) Exponential growth of the web and the internet Statistical method  Logarithmic convert to get linear regression Statistical analysis  Hypothesis: the site is growing (number of pageviews and date are correlated)  R 2 and significance

16 Tracking the Growth of a Site R 2 = 0.96, p<0.001

17 Tracking the Growth of a Site Predict growth rate  Clean noise  Confident interval

18 Predicting Document Access in Large, Multimedia Repositories (by Recker, M. R. and J. E. Pitkow, 1996 ) patterns of document requests in network- accessible multimedia databases Main idea  Two related domains: Human memory and libraries  Borrow models and research results from them

19 Predicting Document Access The model – human memory (Anderson and Schooler)  The relationship of recency and performance is a power function  The relationship of frequency and performance is a power function  Tow parameters for performance Need probability p and Need odds p/(1-p)  The linear function: Log(Need odds) = a Log(Frequency) + b

20 Predicting Document Access Apply Human Memory Analysis in Document Requests Model  Dataset: log file of Georgia Tech WWW repository  A dynamic information ecology  Frequency analysis Regression equation: Log(Need Odds) =.99 Log (Frequency) – 1.30  Recency analysis Regression equation: Log(Need Odds) = -1.15 Log(days) +.41  Combining recency and frequency

21 Predicting Document Access Conclusion  Recency and frequency of past document access are strong predictors of future document access  Recency probed to be a stronger predictor than frequency Applications for the design of information systems  Determine optimal ordering of retrieved items  Inform design decisions  Design of caching algorithms


Download ppt "Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004."

Similar presentations


Ads by Google