Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004.

Slides:



Advertisements
Similar presentations
Web Mining.
Advertisements

1 VLDB 2006, Seoul Mapping a Moving Landscape by Mining Mountains of Logs Automated Generation of a Dependency Model for HUG’s Clinical System Mirko Steinle,
C.-C. Chan Department of Computer Science University of Akron Akron, OH USA 1 UA Faculty Forum 2008 by C.-C. Chan.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
Chapter 12: Web Usage Mining - An introduction
Civil and Environmental Engineering Carnegie Mellon University Sensors & Knowledge Discovery (a.k.a. Data Mining) H. Scott Matthews April 14, 2003.
Chapter 9 Competitive Advantage with Information Systems for Decision Making © 2008 Pearson Prentice Hall, Experiencing MIS, David Kroenke.
Towards Semantic Web Mining Bettina Berndt Andreas Hotho Gerd Stumme.
© Prentice Hall1 DATA MINING TECHNIQUES Introductory and Advanced Topics Eamonn Keogh (some slides adapted from) Margaret Dunham Dr. M.H.Dunham, Data Mining,
Human Memory Model Predicting Document Access in Large Multimedia Repositories (1996) JAMES E. PITKOW, MARGARET M. RECKER Sam Boham, Asif Hussaini, Christian.
LinkSelector: A Web Mining Approach to Hyperlink Selection for Web Portals Xiao Fang University of Arizona 10/18/2002.
Web Usage Mining: Processes and Applications
Data Mining – Intro.
Overview of Web Data Mining and Applications Part I
Data Mining for Web Personalization
CIS 674 Introduction to Data Mining
GUHA method in Data Mining Esko Turunen Tampere University of Technology Tampere, Finland.
Data Mining By Andrie Suherman. Agenda Introduction Major Elements Steps/ Processes Tools used for data mining Advantages and Disadvantages.
Data Mining: Concepts & Techniques. Motivation: Necessity is the Mother of Invention Data explosion problem –Automated data collection tools and mature.
WEB ANALYTICS Prof Sunil Wattal. Business questions How are people finding your website? What pages are the customers most interested in? Is your website.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
OLAM and Data Mining: Concepts and Techniques. Introduction Data explosion problem: –Automated data collection tools and mature database technology lead.
Web Usage Mining Sara Vahid. Agenda Introduction Web Usage Mining Procedure Preprocessing Stage Pattern Discovery Stage Data Mining Approaches Sample.
FALL 2012 DSCI5240 Graduate Presentation By Xxxxxxx.
CS 401 Paper Presentation Praveen Inuganti
Advisor: Hsin-Hsi Chen Reporter: Chi-Hsin Yu Date:
Intelligent Systems Lecture 23 Introduction to Intelligent Data Analysis (IDA). Example of system for Data Analyzing based on neural networks.
Kansas State University Department of Computing and Information Sciences CIS 830: Advanced Topics in Artificial Intelligence From Data Mining To Knowledge.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
Page 1 WEB MINING by NINI P SURESH PROJECT CO-ORDINATOR Kavitha Murugeshan.
Design and Implementation of a Web Log Preprocessing System Supporting Path Completion Batchimeg AI lab
Chapter 1 Introduction to Data Mining
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Data Mining By Dave Maung.
© Prentice Hall1 CIS 674 Introduction to Data Mining Srinivasan Parthasarathy Office Hours: TTH 4:30-5:25PM DL693.
Chapter 12: Web Usage Mining - An introduction Chapter written by Bamshad Mobasher Many slides are from a tutorial given by B. Berendt, B. Mobasher, M.
Data Preprocessing Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Data Mining – Intro. Course Overview Spatial Databases Temporal and Spatio-Temporal Databases Multimedia Databases Data Mining.
Chapter 5: Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization DECISION SUPPORT SYSTEMS AND BUSINESS.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
Definitions (Jim’s) Transformations: General term for anything that takes an input and provides an output (e.g. “transforms” data) Processing: Converting.
Data Mining BY JEMINI ISLAM. Data Mining Outline: What is data mining? Why use data mining? How does data mining work The process of data mining Tools.
BOĞAZİÇİ UNIVERSITY DEPARTMENT OF MANAGEMENT INFORMATION SYSTEMS MATLAB AS A DATA MINING ENVIRONMENT.
Search Engine using Web Mining COMS E Web Enhanced Information Mgmt Prof. Gail Kaiser Presented By: Rupal Shah (UNI: rrs2146)
1 Introduction to Data Mining C hapter 1. 2 Chapter 1 Outline Chapter 1 Outline – Background –Information is Power –Knowledge is Power –Data Mining.
Data Mining and Decision Support
Information Design Trends Unit Five: Delivery Channels Lecture 2: Portals and Personalization Part 2.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 28 Data Mining Concepts.
WIRED - Web Analytics Week WIRED System Evaluations due now Web Logs overview Web Analytics - Understanding Queries - Tracking Users Web Log Reliability.
The KDD Process for Extracting Useful Knowledge from Volumes of Data Fayyad, Piatetsky-Shapiro, and Smyth Ian Kim SWHIG Seminar.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data mining in web applications
Data Mining – Intro.
DATA MINING © Prentice Hall.
Introduction to Data Mining
Supervised Time Series Pattern Discovery through Local Importance
MIS 451 Building Business Intelligence Systems
Introduction C.Eng 714 Spring 2010.
Knowledge Management Systems
Data Mining: Concepts and Techniques Course Outline
Chapter 12: Automated data collection methods
Week 11 Knowledge Discovery Systems & Data Mining :
Data Warehousing and Data Mining
I don’t need a title slide for a lecture
Web Mining Department of Computer Science and Engg.
Course Introduction CSC 576: Data Mining.
Data Warehousing Data Mining Privacy
CSE591: Data Mining by H. Liu
Presentation transcript:

Web Analytics Xuejiao Liu INF 385F: WIRED Fall 2004

Outline Introduction  What is Web Analytics  Why Web Analytics matter Secondary readings  Log files analysis  Web usage mining  Data preparation  KDD process  Document access in repositories

Log File Lowdown (Michael Calore, 2001 ) Log file What are in log file  Traffic  Audience  Browsers/Platforms  Errors  Referers

Log File Lowdown Sample Log File adsl ilm.bellsouth.net - - [09/May/2001:13:42: ] "GET /about.htm HTTP/1.1" “ "Mozilla/4.0 (compatible; MSIE 5.0; Windows 98)" Log File Analyzers  WebTrends, Sawmill, Analog, Webalizer, HTTP-analyze

WebTrends log file analyzer Advantages  Fast and effective  User-friendly interface  Feature-rich  Support different operating systems Disadvantages  Not free

WebTrends

The KDD Process for Extracting Useful Knowledge from Volumes of Data (Fayyad, U., G. Piatetsky-Shapiro, et al. 1996) KDD: Knowledge Discovery in Databases  The value of data  Definitions KDD Data mining

The KDD Process The KDD process 1.Creating a target dataset 2.Preprocessing and data cleaning 3.Data reduction and projection 4.Data mining Choosing the data mining function Choosing the data mining algorithm 5.Interpretation and evaluation

The KDD Process Data Mining  Data mining involves fitting models to or determining patterns from observed data  Data mining algorithms The model The preference criterion The search algorithm

The KDD Process Data Mining  Model functions Classification Regression Clustering Dependency modeling Link anlysis  Goals of Data Mining Predictive and descriptive

Data Preparation for Mining World Wide Web Browsing Patterns (Cooley, R. W., B. Mobasher, et al. 1999) Web Usage Mining vs. data mining The WEBMINER process  Preprocessing  Mining algorithms  Pattern Analysis

Data Preparation Preprocessing  Data cleaning  User identification  Session identification  Path completion  Formatting

Data Preparation

Tracking the Growth of a Site ( Nielsen, Jakob, 1998 ) Exponential growth of the web and the internet Statistical method  Logarithmic convert to get linear regression Statistical analysis  Hypothesis: the site is growing (number of pageviews and date are correlated)  R 2 and significance

Tracking the Growth of a Site R 2 = 0.96, p<0.001

Tracking the Growth of a Site Predict growth rate  Clean noise  Confident interval

Predicting Document Access in Large, Multimedia Repositories (by Recker, M. R. and J. E. Pitkow, 1996 ) patterns of document requests in network- accessible multimedia databases Main idea  Two related domains: Human memory and libraries  Borrow models and research results from them

Predicting Document Access The model – human memory (Anderson and Schooler)  The relationship of recency and performance is a power function  The relationship of frequency and performance is a power function  Tow parameters for performance Need probability p and Need odds p/(1-p)  The linear function: Log(Need odds) = a Log(Frequency) + b

Predicting Document Access Apply Human Memory Analysis in Document Requests Model  Dataset: log file of Georgia Tech WWW repository  A dynamic information ecology  Frequency analysis Regression equation: Log(Need Odds) =.99 Log (Frequency) – 1.30  Recency analysis Regression equation: Log(Need Odds) = Log(days) +.41  Combining recency and frequency

Predicting Document Access Conclusion  Recency and frequency of past document access are strong predictors of future document access  Recency probed to be a stronger predictor than frequency Applications for the design of information systems  Determine optimal ordering of retrieved items  Inform design decisions  Design of caching algorithms